In VA, everything that is not static is considered part of a dynamic scene. All sound sources, sound receivers, underlying geometry and source/receiver directivities are potentially dynamic and therefore are stored and accessed using a history concept. They can be modified, however, during lifetime. Renderers are picking up modifications and react upon the new state, for example, when a sound source is moved or a sound receiver is rotated. Updates are triggered asynchronously by the user or by another application and can also be synchronized ensuring that all signals are started or stopped within one audio frame.
Sound sources and receivers¶
This section explains the interfaces which are valid for sources and receiver. The respective function calls are quasi identical. In the examples below, those functions are called for a sound source. The respective receiver functions follow the same syntax - just substitute
receiver. Functions that are specific to sound sources or receivers are discussed further down.
Creation and deletion¶
Sound sources can be created (optionally assigning a name) using
Swill contain a unique numerical identifier which is required to modify or delete the sound source.
A list of all available sound sources is returned by
IMPORTANT: A sound source/receiver can only be auralized if it has been placed somewhere in 3D space. Otherwise it remains in an invalid state.
Position and orientation¶
Thus, it is required to specify a position as a three-dimensional vector:
p = [x y z]',
q = [a b c d]',
v = [vx vy vz]', and
u = [ux uy uz]', where
'symbolizes the vector transpose.
Generally, both sound sources and receiver can have a directivity. For sound sources, VA expects a energetic directivities with a one-third octave band resolution. For sound receivers, VA expects an HRTF data set. This is relevant, if the receiver represents a human listener and a binaural synthesis is included in the renderer / reproduction processing chain.
Sound sources can be assigned a directivity with a numerical identifier (called
D here) by
To mute (
true) and unmute (
false) a source, type
To control the level of a sound source, assign the sound power in watts
In contrast to all other sound objects, sound sources can be assigned a signal source. It feeds the sound pressure time series for that source and is referred to as the signal (speech, music, sounds). See below for more information on signal sources. The combination with the sound power and the directivity (if assigned), the signal source influences the time-dependent sound emitted from the source. For a calibrated auralization, the combination of the three components have to match physically.
Some renderers allow to use anthropometric data to individualize the utilized HRTFs for the binaural filtering, more specifically by adjusting the respective ITD.
The anthropometric parameters of a receiver can be adjusted using a specific key/value layout combined under the key
anthroparams. All parameters are provided in units of meters
The current parameter values can be displayed using
The VA interfaces provides some special features for receivers that are meaningful only in binaural technology. The head-above-torso orientation (HATO) of a human listener can be set and received as quaternion by the methods
Real-world position and orientation¶
In Virtual Reality applications with loudspeaker-based setups, user motion is typically tracked inside a specific area. Some reproduction systems require knowledge on the exact position of the user's head and torso to apply adaptive sweet spot handling (like cross-talk cancellation). The VA interface therefore includes some receiver-oriented methods that extend the virtual pose with a so called real-world pose. Hardware in a lab and the user's absolute position and orientation (pose) should be set using one of the following setters
Virtual vs. real-world pose¶
The aim of this section is to give a better understanding of the differences between virtual and real-world pose of the receiver:
The virtual pose is used to represent the receiver within the virtual world. Generally, the virtual source and receiver poses are used by the rendering modules to create the audio, most importantly applying the respective sound propagation effects. In this context, it is important to understand that the virtual receiver does not necessarily represent a listener. In fact, this is just the case if the utilized renderer has a binaural output. For Ambisonics or VBAP encoded signals, it represents the center of the loudspeaker array.
Additionally, the real-world pose of the listener is required for certain reproduction modules. Typically, this represents the position and orientation within a loudspeaker array. The most common example is the reproduction of binaural signals via a loudspeaker array using Crosstalk-Cancellation (CTC). The second example are binaural mixdown reproduction modules (BinauralMixdown and BinauralAmbisonicsMixdown). These allow to renderer a binaural signal based on a signal created for a virtual loudspeaker array which can played via headphones.
Here is a summary of the examples above:
- Lv: Listener in virtual world
- Cv: Center of loudspeaker array in virtual world
- LRW: Listener in real world (e.g. standing within loudspeaker array)
|Renderer output||Reproduction type||Virtual pose||Real-world pose|
|Binaural||CTC => Loudspeakers||Lv||LRW|
|Ambisonics||HOA decoding => Loudspeakers||Cv||-|
|Ambisonics/VBAP||Binaural mixdown => Headphones||Cv||LRW|
Sound signals or signal sources represent the sound pressure time series that are emitted by a source. Some are unmanaged and are directly available, others have to be created. To get a list with detailed information on currently available signal sources (including those created at runtime), type
Audio files that can be attached to sound sources are usually single channel anechoic WAV files. In VA, an audio clip can be loaded as a buffer signal source with special control mechanisms. It supports macros and uses the search paths to locate a file. Using relative paths is highly recommended. Two examples are provided in the following:
DemoSoundmacro points to the 'Welcome to Virtual Acoustics' anechoically recorded file in WAV format, which resides in the common
datafolder. Make sure that the VA application can find the common
datafolder, which is also added as a search path in the default configurations.
Now, the signal source can be attached to a sound source using
Any buffer signal source can be started, stopped and paused. Also, it can be set to looping or non-looping mode (default).
To receive the current state of the buffer signal source, use
Audio input channels¶
Input channels from the sound card can be directly used as signal sources (microphones, electrical instruments, etc) and are unmanaged (can not be created or deleted). All channels are made available individually on startup and are integrated as list of signal sources. The respective IDs are
'audioinput1' for the first channel, and so on.
VA also provides a parameter-controlled signal source representing a jet engine. Its implementation is based on the book Designing Sound by Andy Farnell from 2010. It can be created using
It is possible to set the rotational speed by handing a numerical value in rounds per minutes. In the following example, the corresponding variable is called
The TTS signal source allows to generate speech from text input. Because it uses the commercial CereProc's CereVoice third party library, it is not included in the VA package for public download. However, if you have access to the CereVoice library and can build VA with TTS support, this is how it works in
tts_signal_source = va.create_signal_source_text_to_speech( 'Heathers beautiful voice' ) tts_in = struct(); tts_in.voice = 'Heather'; tts_in.id = 'id_welcome_to_va'; tts_in.prepare_text = 'welcome to virtual acoustics'; tts_in.direct_playback = true; va.set_signal_source_parameters( tts_signal_source, tts_in )
VA also provides specialized signal sources which can not be covered in detail in this introduction. Please refer to the source code for proper usage.
Directivities (including HRTFs)¶
Sound source and receiver directivities are usually made available as a file resource including multiple directions on a sphere for far-field usage. VA currently supports the OpenDAFF format with time domain and magnitude spectrum content type. The magnitude spectra are used for source directivities with a one-third octave band resolution. The time domain format is used for receiver directivities, i.e. HRIR / HRTF data sets.
Directivities can be loaded with
VA ships with the ITA artificial head HRTF dataset (actually, the DAFF exports this dataset as HRIR in time domain), which is available under Creative Commons license for academic use. The default configuration files include this HRTF dataset as
DefaultHRIR macro. Make sure that the VA application can find the common
data folder, which is also added as an include path in default configurations. Then, the directivity object can be created using
Additionally, VA provides example source directivities for a trumpet and a singer. The respective macros are
As introduced in the configuration section, most rendering and reproduction modules work with a homogeneous medium. The respective parameters can also be adjusted during runtime. The respective setter and getter functions are introduced here.
Speed of sound in m/s
Temperature in degree Celsius
Static pressure in Pascal
Relative humidity in percentage (ranging from 0.0 to 100.0 or above)
Solving synchronization issues¶
Scripting languages like Matlab are problematic by nature when it comes to timing: evaluation duration scatters unpredictability and timers are not precise enough. This becomes a major issue when, for example, a continuous motion of a sound source should be performed with a clean Doppler shift. A simple loop with a timeout will result in audible motion jitter as the timing for each loop body execution is significantly diverging. Also, if a music band should start playing at the same time and the start is executed by subsequent scripting lines, it is very likely that they end up out of sync.
To avoid timing problems, the
VAMatlab binding provides a high-performance timer that is implemented in C++. It should be used wherever a synchronous update is required, mostly for moving sound sources or sound receivers. An example for a properly synchronized update loop at 60 Hertz that incrementally drives a source from the origin into positive X direction until it is 100 meters away:
Synchronizing multiple updates¶
VA can execute updates synchronously in the granularity of the block rate of the audio stream process. Every scene update will be withhold until the update is unlocked. This feature is mainly used for simultaneous playback start.
va.lock_update va.set_signal_source_buffer_playback_action( drums, 'play' ) va.set_signal_source_buffer_playback_action( keys, 'play' ) va.set_signal_source_buffer_playback_action( base, 'play' ) va.set_signal_source_buffer_playback_action( sax, 'play' ) va.set_signal_source_buffer_playback_action( vocals, 'play' ) va.unlock_update
It is also useful for uniform movements of spatially static sound sources (like a vehicle with four wheels). However, locking updates will inevitably lock out other clients (like trackers) and should be released as soon as possible.
VA does not support tracking internally but facilitates the integration of tracking devices to update VA entities. For external tracking, the
VAMatlab project currently supports NaturalPoint's OptiTrack and Advanced Realtime Tracking (AR-Tracking) devices to be connected to a server instance. It can automatically forward rigid body poses (head and torso, separately) to one sound receiver and one sound source. Another possibility is to use an HMD such as Oculus Rift and HTC Vive and update VA through Unity or Unreal.
OptiTrack or AR-Tracking via VAMatlab¶
Virtual receiver pose¶
To connect a rigid body to a VA sound receiver (here, the receiver ID is 1), use
If the rigid body index should be changed (e.g., to index 3 for head and 4 for torso), use
The head rigid body (rb) can also be locally transformed using a translation and (quaternion) rotation method, e.g., if the rigid body barycenter is not between the ears or is rotated against the default orientation:
Real-world receiver pose¶
As explained here, some reproduction modules require specifying the real-world pose of the receiver. It can be controlled by the tracking system using the following methods:
va.set_tracked_real_world_sound_receiver( 1 ) va.set_tracked_real_world_sound_receiver_head_rigid_body_index( 3 ) va.set_tracked_real_world_sound_receiver_torso_rigid_body_index( 4 ) va.set_tracked_real_world_sound_receiver_head_rb_trans( [ x y z ] ) va.set_tracked_real_world_sound_receiver_head_rb_rotation( [ a b c d ] )
For tracking a sound source, similar functions are available. Note that there is no real-world position for sound sources.
After all tracking settings are adjusted, connect a tracking system to VA using
localhostnetwork loopback device.
In case that the tracker is running on another machine, OptiTrack requires to both set the remote (in this example
192.168.1.2) and the client machine IP (in this example
192.168.1.143) like this
If using an AR-Tracking system, the client IP is not required. However, the tracking type has to be specified as third input:
HMD via VAUnity¶
To connect an HMD, set up a Unity scene and connect the tracked GameObject (usually the MainCamera) with a
VAUSoundReceiver instance. For further details, please read the README files of VAUnity.