Audio I/O In VR
B582 VR Hardware Presentation
Ying Feng, Feb.
2000

2. Sound in VR
2.1 Sound
Perception
2.2 Traditional
Sound Processing
2.3 3D Sound Reproduction
2.4 Sound Authoring
and Sonification
2.5 Pros and Cons

2.1 Sound Perception
Eight auditory localization cues to help locate
the position of a sound source in space:
-
Interaural time difference: the
time delay between sounds arriving at the left and right ears.
-
Head shadow: sound having
to go through or around the head in order to reach an ear.
-
Pinna response: the
effect that the external ear, or pinna, has on sound.
-
Shoulder echo: sound
in the range of 1-3kHz are reflected from the upper torso of the human
body.
-
Head motion: movement
of the head helps in determining a location of a sound source.
-
Early echo reponse:
occurs in the first 50-100ms of a sounds life.
-
Reverberation: reflections
from surfaces around.
-
Vision: helps us quickly
locate and confirm the location and direction of a sound.
Back to top of this page.

2.2 Sound Processing
Virtual reality's immersive quality can be enhanced
greatly through the use of properly cued, realistic sounds. So we need
to reproduce sounds imitating the those in the real world.
Sound processing includes:
-
Encoding of directional localization cues on several
audio channels:
-
A/D (analog-to-digital) converter.
-
Digital recorder, both hardware (multi-track digital
recorder) or software tools (Protools etc.).
-
Sound synthesizer (MIDI used for music).
-
Transmission or storage of sound in a certain format:
-
Sound file formats (Mu-law, NeXT, wav, aiff, snd
etc.)
-
Compression techniques.
-
Playback of sound:
-
D/A (digital to analog) converter.
-
Soundcard and sample player software (such as PC
SoundBlaster).
-
Headphones.
-
Loudspeaker system.
The encoding of sound can be achieved in three
ways:
-
Recording (or sampling) of an existing sound scene.
-
Synthesis of a virtual sound scene.
-
Combination of the above two methods.
Different types of sounds:
-
Mono sound:
-
Recorded with one microphone; signals are the same
for both ears.
-
Sound only at one point (0-dimensional), no sense
of sound positioning.
-
Stereo sound:
-
Recorded with two microphones several feet apart
separated by empty space; signals from each microphone goes into one of
the ear respectively.
-
Heard commonly through stereo headphones or speakers;
typical multimedia configuration of personal computers.
-
Gives a better sense of the sound's position as recorded
by the microphones, but only varies across one axis (1-dimensional), and
the sound sources appear to be at a position inside the listener's head.
-
Binaural Sound:
-
Recorded in a manner that more closely resembles
the human acoustic system, by microphones embedded in a dummy head.
-
Sounds more realistic (2-dimensional), and yield
sounds that sound external to the listener's head.
-
Binaural sound was the most common approach to spatialization;
the use of headphones takes advantage of the lack of crosstalk and a fixed
position between sound source (the speaker driver) and the ear.
-
Here are some samples
of binaural sound.
-
3D Sound:
-
Often termed as spacial
sound, is sound processed to give the listener
the impression of a sound source within a three-dimensional environment.
-
New technology under developing, best choice for
VR systems.
-
Here are some sample
3D sound.
The definition of virtual reality requires the person
to be submerged into the artificial world by sound as well as sight. Simple
stereo sound and reverb is not convincing enough, particularly for sounds
coming from the left, right, front, behind, over or under the person -
360 degrees both azimuth and elevation. 3D sound thus emerged.
Back to top of this
page.

2.3 3D Sound Synthesis
3D Sound synthesis is
a signal processing system reconstructs the localization of each sound
source and the room effect, starting from individual sound signals and
parameters describing the sound scene (position, orientation, directivity
of each source and acoustic characterization of the room or space).
Several techniques can be employed:
-
Sound rendering:
-
Creates a sound world by attaching a characteristic
sound to each object in the scene, 4 stages of a pipelined process:
-
Generation of each object's characteristic sound
(recorded, synthesized, modal analysis-collisions).
-
Sound instantiation and attachment to moving objects
within the scene.
-
Calculation of the necessary convolutions to describe
the sound source interaction within the acoustic environment.
-
Convolutions are applied to the attached instantiated
sound sources.
-
Its simularity to ray-tracing and its unique approach
to handling reverberation are noteworthy aspects; but it handles the simplicity
of an animated world that is not necessarily real-time.
-
Modeling human acoustic system with head-related
transfer function (HRTF):
-
The HRTF is a linear function that is based on the
sound source's position and takes into account many of the cues humans
use to localize sounds.
-
Process steps:
-
Record sounds with tiny probe microphones in the
ears of a real person.
-
Compare the recorded sound with the original sounds
to compute the person's HRTF.
-
Use HRTF to develop pairs of finite impulse response
(FIR) filters for specific sound positions.
-
When a sound is placed at a certain position in virtual
space, the set of FIR filters that correspond to the position is applied
to the incoming sound, yielding spatial sound.
-
The computations are so demanding that they currently
cannot be performed in real-time without special hardware.
-
E.g. Convolvotron
is a DSP chip for this purpose.
-
3D
sound imaging:
-
Approximate binaural spatial audio through the interaction
of a 3D environment simulation:
-
Compute line-of-sight information between the virtual
user and the sound sources.
-
The sounds emitted by these sources will then be
processed based on their location, using some software DSP algorithms or
simple audio effects modules with delay, filter, pan and reverb capabilities.
-
The final stereo sound sample will then be played
into a headphone set through a typical user-end sample player, acoording
to the user's position.
-
Suitable for simple VR systems where a sense of space
is desired rather than an absolute ability to audially locate sound sources.
-
Utilization of speaker locations:
-
Use strategically placed speakers to form a cube
of any size to simulate spatial sound:
-
Two speakers are located in each corner of the cube,
one up high and one down low.
-
Pitch and volume of the sampled sounds distributed
through the speakers appropriately give the perception of a sound source's
spatial location.
-
Less accuracy than sound yielded by convolving sound,
but effective speedup of processing, allowing for much less expensive
real-time spatial sound.
-
E.g. Audio
Image Sound Cube.
Spatial synthesis parameters can be provided by:
-
Analysis of an existing scene: through
position trackers, cameras, adaptative acoustic arrays...
-
The user's actions:
man-to-machine interface - mixing desk, graphic or gestual interfaces...
-
A stand-alone process: videogames,
simulators.
-
E.g. the position coordinates provided by a head-tracking
system can be exploited simultaneously for updating the synthetic sound
scene reproduced over headphones.
For more details, refer to Cindy Tonnesen and Joe
Steinmetz's artical 3D
Sound Synthesis and Jean-Marc Jot's paper Synthesizing
Three-Dimensional Sound Scenes in Audio or Multimedia Production and Interactive
Human-Computer Interfaces.
Back to top of this
page.

2.4 Sound Authoring and Sonification
Sound Authoring:
-
Definition:
-
A process of creating or designating an audio component
in a multi-model computing environment, such as virtual reality, web browsing
or multimedia.
-
More specifically, the process of establishing automated
relationships between objects or events in a silent computing application,
and algorithms for sound production which operate in parallel to the silent
application.
-
Functionality:
-
Sound Computation.
-
Programmable Data-driven Sound.
-
Parallel Processing of sounds, graphics and simulations.
-
Synchronization in real-time.
-
Automated real-time mixing of multiple sounds.
-
Automated interpretation of silent events to control
sound production.
Sonification:
-
Definition:
-
The transformation of numerical data into sound for
purposes of observing that data.
-
More specifically, the projection of relations in
a numerical domain onto relations in an auditory range.
-
Task:
-
Identify and construct an intuitive perceptual space
for the auditory display of data, includes the assimilation of technological,
creative and scientific advances, in sound synthesis and signal processing,
and in human perception and cognition.
VSS,
the NCSA Sound Server developed by NCSA
Audio Development Group (ADG), supports both sound authoring and sound
sonification by providing an API which enables a software developer in
adopting and developing sound for virtual environments and interactive
displays.
Back to top of this
page.

2.5 Pros and Cons
Advantages of spacial sound over non-spacial sound:
-
Spatial sound restores the capacity of our perception
to exploit spatial auditory cues in order to segregate sounds emanating
from different directions.
-
Spacial sound increases the coherence of auditory
cues with those conveyed by cognition and other modes of perception (visual,
haptic...).
-
Spatial sound processing is a key factor for improving
the legibility and naturalness of a virtual scene, it enriches the immersive
experience and creates more "sensualized" interfaces.
-
A 3D audio display can enhance multi-channel communication
systems: a 3D audio display spatially separates messages from one
another, thereby making it easier for the operator to attend to any selected
message
Problems to be overcome with spacial audio:
-
Cost:
-
Until very recently, it's still the biggest barrier
to the widespread use of spacial audio.
-
Environmental Modeling:
-
The cost of exact environmental modeling for differnt
aditory cues is extraordinarily high.
-
Common problems in spacial sound generation that
tend to lessen its immersiveness: front-to-back reversals, intracranially
heard sounds, and HRTF.
-
Headphones and speakers:
-
Spatial audio systems designed for use via headphones
may result in certain limitations on their use:e.g.
inconvenience of wearing some sort of headset.
-
With speakers, the spatial audio system must have
knowledge of the listener's position and orientation with respect to the
speakers
-
Effectiveness:
-
Auditory localization is still not fully understood,
and thus developers cannot make effective price/performance decisions in
the design of spatial audio systems.
-
Software Engineering:
-
In large software systems, most spatial audio systems
provide little in the way of environmental modeling, synchronization, or
network support.
Back to top of this page.
Back to
contents of the presentation.
If you have comments or suggestions,
email me at yfeng@cs.indiana.edu