Audio I/O In VR
B582 VR Hardware Presentation
Ying Feng, Feb.
2000

1. Overview

Why do we need audio in VR:
Sounds are a constant presence in our everyday world
and offer rich cues about our environment. As the auditory cue, audio,
together with other multisensorial cues (visual, haptic, tactile and olfactory),
helps increase the sense of presence in a virtual environment.
-
Enhancement to display: Sound
enhances the display of spatial information, particularly space beyond
the field of view.
-
Simulated properties: Data-driven
sound can convey simulated properties of the constituents of the environment.
e.g:
mass, force of impact, surface characteristics such as softness or hardness,
and hidden features such as hollowness.
-
Alternative feedback: Sound
feedback can make up for the deficiencies of visual feedback. e.g.
instant response to wand button action; help for visually deficient people.
-
Higher resolution: Audio
signals provide a higher degree of temporal resolution than visual display.
e.g:
the time quanta in a CD-quality audio signal are 44,000 in one second,
compared to 60 in one second for video image half-frames ("fields").
-
Voice input: Voice is
a desirable and convenient input device for computer-human interaction
in VR systems.
-
Voice output: Voice
generated by computer can imitate the human form of conmunication.
Brief history:
-
1930's: The first spatial audio system, a bulky mechanical
structure big as a hosue was built.
-
1970's: Researchers on voice recoginition made computers
able to understand human speech with pauses between words.
-
1987: Research on 3D sound started at the NASA/Ames
Research Center (later formed Crystal River Engineering).
-
1988: Small, fast electronic spatial audio systems
that could operate in real time (Wenzel et al) appeared; the first such
commercial system is the Convolvotron, at the size of a desktop.
-
Early 1990's: The demands of the telecommunications
industry lead to the development of special computer chips optimized for
digital filter applications called Digital Signal Processors (DSPs), which
were sufficiently fast to produce spatial audio in real time.
-
1990's: Voice activated software, a.k.a. speech recognition,
emerged as the vanguard in word processing technology; but due to lack
of powerful hardware support the pioneers were poorly received.
-
1991: Many companies including VPL and Sense8 used
3D sound synthesis technology for their multiple sensory displays.
-
1992-1993: The manufacturer's cost of a spatial audio
system could potentially be the same as the cost of a DSP chip, which was
as low as $10.
-
1994: With the advent of the Pentium Processor and
the lowered cost of memory the hardware was sufficient to drive voice recognition
software.
-
1996: 3D sound positioning technology (such as WORF)
became available on PCs; Windows 95 driver started to support spatializer
3D stereo sound hardware.
-
Recent: Continuous speech recognition for digits and other small vocabulary
situations are available, speaker-independent systems has been developed.
-
Present: A mask-programmed DSP chip for embedded
3-D sound applications under development by Multimedia Computing Group
at Georgia Tech.
More information can be found at Audio
Recording History and Development and History
and Development of Voice Activated Software
Back to
contents of the presentation.
If you have comments or suggestions,
email me at yfeng@cs.indiana.edu