Oral final exam at scheduled times during exam week. Exam will include short answers to questions about topics in the course, a few production tasks and a phonetic transcription of a brief taped passage. It should include about 10 questions, plus some questions about a sound spectrogram of speech, plus 5-8 minutes for transcription. It will probably last about 20 minutes
Text Materials: Ladefoged, Chapts 1-11. Port's handout on English allophones Some of the materials on ToBI from OSU. Port's handouts on acoustics Sinewave Speech web page demo. J. Miller's `Speech perception' (1990) ______________________________________________________
Performance Skills Be able to produce a set of vowels; voiced and voiceless versions of any obstruent (that is, stop or fric), a click, an ejective, an implosive and a plosive, trills, fricatives and approximants, etc.
Be able to transcribe dialects of English using the basic symbol set of IPA.
Places of Articulation - at two `levels of detail'
Labial, Apical, Palatal, Velar, Glottal, or
bilabial, labiodental, dental, alveolar, retroflex, palato-alveolar,
palatal, velar, uvular, pharyngeal, glottal (Ladef, Table 7.3)
Vowels
vowel dimensions: tongue height, tongue backness, rounding
Only 2 for Eng? Or 3? Or more?
monophthongs vs diphthongs
Secondary vowel articulations: nasalization, rhoticism or retroflexion
stressed, unstressed; reduced vowels, full vowel
tense Vs, lax Vs; closed vs. open syllables (Which Eng Vs only occur
in closed syllables?)
Cardinal vowels (D. Jones) - what purpose? how defined and problems?
Consonants
stops, fricatives (obstruents); What are essential component gestures?
variants: lateral release, glottal stop, flap/tap
homorganic relationships, voicing pairs
coarticulation
palatography (palatograms)
affricates vs. fricatives
IPA phonetic alphabet
a. in contrast to `orthography' (conventional spelling)
b. IPA is good for writing down approximate actual pronunciations
c. it can be a model for cognitive form of words,
but not necessarily a good model of cognitive structures
d. its EASY to cover MOST of the sounds of languages of the
world, but IMPOSSIBLE to get ALL of them
e. Reasons why is the Place x Manner x Voicing model inadequate:
1) `secondary articulations', clicks? (not enough dimensions)
2) holes: `lateral velars' (dimensions not indep't)
3) what about timing patterns? (nonsegmental properties ignored)
Phonology: the use of phonetic sounds for "spelling" morphemes in lgs.
People seem to have awareness of gross sound categories, like phonemes, but
not to be aware of phonetic details. Though some details are easier to
hear than others.
Phonemes: hypothesized, abstract, cognitive sound units that resemble the
letters of an alphabet.
Prosody: Languages have conventional patterns of pitch, loudness and
timing at the level of words and phrases. But these are typically
difficult to describe for many languages - given current knowledge.
Allophonic rules cause `alternations' (eg, Ladef, p. 39). Some rules
are language specific, some nearly universal (eg, nasaliztn of Vs/_N)
Allophonic rules for English. Be familiar with the major rules
on my handout (and the similar set in Ladef Ch 4).
Prosody of English
TOBI basics: pitch accents (H*, L*, H*+L, L+H*, etc), boundary tones
(H%, L%, L%) and phrasal tones (L- and H-).
Max of one pitch accent per intonation phrase.
Break indices: 0 (no break), 2 (word-word bndy),4 phrase-end.
Articulation Stricture Types.
vowel (nonturbulent air flow)
approximant (turbulent flow when voiceless, but not when voiced)
fricative (always turbulent air flow) (So what's an AFFRICATE again?)
stop (oral and nasal closure)
nasal (velum lowered to connect nose with oral cavity)
trill - oscillating articulator like apex, uvula or lips
tap - to and fro gesture of articulator (eg, /d/ in `ladder')
flap - gesture of brushing past an articulator (eg, retroflexed tap)
semivowel - momentary approximant
lateral gesture - eg, /l/
Phonation
voice: normal voice, breathy, falsetto, creaky voice (laryngealized)
vocal folds, glottis. What makes vocal folds oscillate?
[voice] feature in Eng obstruents: phonetic cues in initial
(aspiration) and final (vowel and consonant durations)
syllable positions (Ladef, p. 51)
Voice-onset time: prevoiced (fully voiced), short lag
(unaspirated), long-lag (aspirated). Compare English with Spanish
French, Thai and Hindi (also with `murmured' or breathy-voice stops)
Air Stream Mechanisms (Ladef Ch. 6)
pulmonic initiation: egressive, `plosives'
vs. ingressive (very rare)
glottalic initiation: egressive, `ejectives'
vs. ingressive, `implosives'
velaric initiation: egressive (very rare)
vs. ingressive, `clicks'
Place of Articulation Issues:
labiovelars (double articulation)
epenthesis: eg, in nasal-fricative clusters: "prince/prints, Chom(p)sky,
comfy, false/faults"
fricativization of stops: in "liquor, sticky, buggy, tasks, posts, lisps"
Manners of Articulation
nasals (voiced, voiceless), fricatives (sibilants, etc)
nasal plosion (eg, `sudden')
trill, tap (or flap)
R-like sounds
Laterals: approximant vs fricative, +/- voice, dark/light
lateral plosion (eg, `pickle')
Acoustics of Speech
acoustic medium, wave motion, transverse vs. longitudinal wave
period, amplitude, wave velocity
additivity (superposition) property. Implications:
1) spectral representation - shows amplitude of sinusoidal
components in a complex waves
2) independence of sound sources in environment
3) filtering of selected frequencies
acoustic filter - multiplication of some amplitudes by number
less than 1. The Transfer Function of a filter displays that
multiplication for each frequency.
Acoustic Theory of Speech Production. It asserts that observed
acoustic signals of speech result from a sound Source (from the glottal
buzz or frication) filtered by a vocal Cavities (in front of the source).
OUTPUT = SOURCE function x FILTER function
Implications of the Acous.Th.Sp.Prod.:
a) large articulatory movements will change the transfer function
(resulting in audible changes in output)
b) but differences between vocal tract size will also cause
changes in transfer function, eg, children have higher resonant
frequencies than adults. (But RELATIVE articulatory motions
are still differentiable.)
c) changes in SOURCE (eg, voice quality or pitch) yield characteristic
changes in acoustic output without changing transfer function
effects
d) certain gesture combinations will reinforce each other (since
they have the same effect), while others will have no effect on outputs
(since they cancel each other's effect) --eg, raising tongue relative to
jaw while lowering the jaw, or spreading lips while lowering the larynx.
Reading Spectrograms: formants, how to recognize vowels (front, back, high, low);
stops, nasals, voicing, place of articulation, etc.
Speech Perception
Examples of context sensitivity of cues on `category boundaries'
Sinewave Speech (sine waves rather than formants):
demonstrates role of trajectories over static spectra
to `specify' speech.
Avoidance conditioning procedure for chinchillas
High-amplitude sucking procedure for infants
Phonetic Change in Dialect
Members of communities differ in their `prestige' and differ
in pronunciation detail. Often, a particular pronunciation gets
associated with high (or low) prestige individuals within a community.
These features may tend to be imitated (or avoided)
accordingly. Along the way, they may be generalized along
phonologically predictable lines. Eventually this results in
historical language change.
What makes a group of people a speech community?
Motor Control for Speech.
Port's informal coarticulatory model: lips, tongue tip, tongue body
(H/L, F/B), velum, glottis (open/clos, pitch, quality)
Evidence for abstractness of speech gestures.
Problem of phasing: what is it?
Speech Perception
Motor Theory of Speech Perception
This is the first theory of speech perception based on appreciation of some
of the main difficulties to be explained: eg, the massive context
sensitivity of speech `cues' and the consistent tendency for listeners
to `hear' what speaker articulated, not what their spectra look like.
1. perception based on production. ``intimate link between prodctn
and percptn'', ``perception by virtue of knowledge of production''
(including coarticulation and context sensitivity).
2. Innate. Not learned. Knowledge is apparently built-in.
3. Thus, this knowledge should be species-specific.
- - - - - - - - - - - - - - - - - - - - - - - -
Comments:
A. This explains why we hear the s/sh boundary in different places
depending on the vowel, and why we call some falling F2 transitions [d]
while others are [g]. Other effects are the `pi/ka/pu effect' (for
noise bursts before various vowels).
B. The theory predicts infant performance should resemble adults (true
after 12 mo), and predicts animals should fail with similar tasks (mostly true).
C. Compatible with interaction of visual and auditory information in
perception - since both derive from articulatory gestures. Kuhl and
Meltzoff (1982) found infant preference to look at faces that match speech
rather than conflict with speech.
Port's Simple Auditory Theory (like Touch/Tone phone)
The psychoacoustician's theory -- possibly resembling J. Pastore, 1981.
The simplest model says that speech sounds are each distinctive and
are simply identified by a static template. There is no mysterious link
to the gestures of speech production.
1. Static auditory cues will do the job.
2. Innate, since, of course, nothing much needs to be learned.
Though listeners presumably get very good at listening to speech.
3. Should work across species with similar auditory systems.
- - - - - - - - - - - - - - - - - - - - - - - -
Comments:
A: Some animal experiments support this result. Eg, chinchillas
classify voice-onset time similar to humans (Kuhl and Miller, 1986).
B: Predicts little interaction with visual information since audition
and vision are quite distinct modalities, plus only static cues are
relevant anyway.
__________________________________________________________________
April 26, 2001
RFP