Study Guide for Final Exam
L306, April 26, 2001

Oral final exam at scheduled times during exam week. Exam will include short answers to questions about topics in the course, a few production tasks and a phonetic transcription of a brief taped passage. It should include about 10 questions, plus some questions about a sound spectrogram of speech, plus 5-8 minutes for transcription. It will probably last about 20 minutes

Text Materials:
  Ladefoged,  Chapts 1-11.
  Port's handout on English allophones
  Some of the materials on ToBI from OSU.
  Port's handouts on acoustics
  Sinewave Speech web page demo.
  J. Miller's `Speech perception' (1990)
  ______________________________________________________

REVIEW

Performance Skills

Be able to produce a set of vowels; voiced and voiceless versions of 
any obstruent (that is, stop or fric), a click, an ejective, an 
implosive and a plosive, trills, fricatives and approximants, etc.
 

Be able to transcribe dialects of English using the basic symbol set of IPA.

Places of Articulation - at two `levels of detail'
       Labial, Apical, Palatal, Velar, Glottal,  or
       bilabial, labiodental, dental, alveolar, retroflex, palato-alveolar,
         palatal, velar, uvular, pharyngeal, glottal (Ladef, Table 7.3)

Vowels
       vowel dimensions: tongue height, tongue backness, rounding
        Only 2 for Eng? Or 3? Or more?
       monophthongs vs diphthongs 
       Secondary vowel articulations: nasalization, rhoticism or retroflexion
       stressed, unstressed;  reduced vowels, full vowel
       tense Vs, lax Vs;  closed vs. open syllables (Which Eng Vs only occur
               in closed syllables?)
       Cardinal vowels (D. Jones) - what purpose? how defined and problems?
Consonants
       stops, fricatives (obstruents);  What are essential component gestures?
       variants: lateral release, glottal stop, flap/tap
       homorganic relationships, voicing pairs
       coarticulation
       palatography (palatograms)
       affricates vs. fricatives

IPA phonetic alphabet
       a. in contrast to `orthography' (conventional spelling)
       b. IPA is good for writing down approximate actual pronunciations 
       c. it can be a model for cognitive form of words, 
         but not necessarily a good model of cognitive structures
       d. its EASY to cover MOST of the sounds of languages of the
            world, but IMPOSSIBLE to get ALL of them
       e. Reasons why is the Place x Manner x Voicing model inadequate:
           1) `secondary articulations', clicks? (not enough dimensions)
           2) holes:  `lateral velars' (dimensions not indep't)
           3) what about timing patterns?  (nonsegmental properties ignored)

Phonology: the use of phonetic sounds for "spelling" morphemes in lgs.

       People seem to have awareness of gross sound categories, like phonemes, but 
         not to be aware of phonetic details.  Though some details are easier to 
         hear than others. 
       Phonemes: hypothesized, abstract, cognitive sound units that resemble the 
         letters of an alphabet.
        Prosody: Languages have conventional patterns of pitch, loudness and 
         timing at the level of words and phrases. But these are typically
         difficult to describe for many languages - given current knowledge.
        Allophonic rules cause `alternations' (eg, Ladef, p. 39).  Some rules
         are language specific, some nearly universal (eg, nasaliztn of Vs/_N)
       Allophonic rules for English. Be familiar with the major rules
         on my handout (and the similar set in Ladef Ch 4).
       
Prosody of English
         TOBI basics: pitch accents (H*, L*, H*+L, L+H*, etc), boundary tones 
         (H%, L%, L%) and phrasal tones (L- and H-).       
         Max of one pitch accent per intonation phrase.
         Break indices: 0 (no break), 2 (word-word bndy),4 phrase-end.

Articulation Stricture Types.
       vowel (nonturbulent air flow)
       approximant (turbulent flow when voiceless, but not when voiced)
       fricative (always turbulent air flow) (So what's an AFFRICATE again?)
       stop  (oral and nasal closure)
       nasal (velum lowered to connect nose with oral cavity)
       trill  - oscillating articulator like apex, uvula or lips
       tap - to and fro gesture of articulator (eg, /d/ in `ladder')
       flap - gesture of brushing past an articulator (eg, retroflexed tap)
       semivowel - momentary approximant
       lateral gesture  - eg, /l/

Phonation      
       voice: normal voice, breathy, falsetto, creaky voice (laryngealized)
       vocal folds, glottis. What makes vocal folds oscillate?
       [voice] feature in Eng obstruents: phonetic cues in initial 
         (aspiration) and final (vowel and consonant durations)
         syllable positions (Ladef, p. 51)
       Voice-onset time: prevoiced (fully voiced), short lag 
         (unaspirated), long-lag (aspirated).  Compare English with Spanish
         French, Thai and Hindi (also with `murmured' or breathy-voice stops)

Air Stream Mechanisms (Ladef Ch. 6)
       pulmonic initiation: egressive, `plosives'
               vs. ingressive (very rare)
       glottalic initiation: egressive, `ejectives' 
               vs. ingressive, `implosives' 
       velaric initiation: egressive  (very rare)
               vs. ingressive, `clicks' 


Place of Articulation Issues: 
       labiovelars (double articulation)
       epenthesis: eg, in nasal-fricative clusters:  "prince/prints, Chom(p)sky, 
          comfy, false/faults"
       fricativization of stops: in "liquor, sticky, buggy, tasks, posts, lisps"

Manners of Articulation
       nasals (voiced, voiceless), fricatives (sibilants, etc)
               nasal plosion (eg, `sudden')
       trill, tap (or flap)
       R-like sounds 
       Laterals: approximant vs fricative, +/- voice, dark/light
               lateral plosion (eg, `pickle')

Acoustics of Speech
       acoustic medium, wave motion, transverse vs. longitudinal wave
       period, amplitude, wave velocity
       additivity (superposition) property. Implications:
          1) spectral representation - shows amplitude of sinusoidal
               components in a complex waves
          2) independence of sound sources in environment
          3) filtering of selected frequencies
       acoustic filter - multiplication of some amplitudes by number
          less than 1.  The Transfer Function of a filter displays that 
          multiplication for each frequency.

Acoustic Theory of Speech Production.  It asserts that observed 
   acoustic signals of speech result from a sound Source (from the glottal 
   buzz or frication) filtered by a vocal Cavities (in front of the source).
       OUTPUT = SOURCE function x FILTER function

   Implications of the Acous.Th.Sp.Prod.:
          a) large articulatory movements will change the transfer function
               (resulting in audible changes in output)
          b) but differences between vocal tract size will also cause 
               changes in transfer function, eg, children have higher resonant
               frequencies than adults. (But RELATIVE articulatory motions
               are still differentiable.)
          c) changes in SOURCE (eg, voice quality or pitch) yield characteristic
               changes in acoustic output without changing transfer function
               effects
          d) certain gesture combinations will reinforce each other (since
       they have the same effect), while others will have no effect on outputs
       (since they cancel each other's effect) --eg, raising tongue relative to 
       jaw while lowering the jaw, or spreading lips while lowering the larynx.

Reading Spectrograms: formants, how to recognize vowels (front, back, high, low); 
       stops, nasals, voicing, place of articulation, etc.

Speech Perception
       Examples of context sensitivity of cues on `category boundaries'
       Sinewave Speech (sine waves rather than formants):
               demonstrates role of trajectories over static spectra
               to `specify' speech.
       Avoidance conditioning procedure for chinchillas
       High-amplitude sucking procedure for infants

Phonetic Change in Dialect 
       Members of communities differ in their `prestige' and differ
in pronunciation detail.  Often, a particular pronunciation gets
associated with high (or low) prestige individuals within a community.
       These features may tend to be imitated (or avoided)
accordingly.  Along the way, they may be generalized along
phonologically predictable lines. Eventually this results in
historical language change.
      What makes a group of people a speech community?

Motor Control for Speech.
       Port's informal coarticulatory model: lips, tongue tip, tongue body 
       (H/L, F/B), velum, glottis (open/clos, pitch, quality)   
       Evidence for abstractness of speech gestures.
       Problem of phasing: what is it?

       
Speech Perception

 Motor Theory of Speech Perception               

This is the first theory of speech perception based on appreciation of some 
of the main difficulties to be explained: eg, the massive context 
sensitivity of speech `cues' and the consistent tendency for listeners 
to `hear' what speaker articulated, not what their spectra look like.

    1.  perception based on production. ``intimate link between prodctn
and percptn'', ``perception by virtue of knowledge of production''
(including coarticulation and context sensitivity).    

    2.   Innate. Not learned. Knowledge is apparently built-in.

    3.   Thus, this knowledge should be species-specific.

- - - - - - - - - - - - - - - - - - - - - - - - 
Comments:
A.  This explains why we hear the s/sh boundary in different places
depending on the vowel, and why we call some falling F2 transitions [d]
while others are [g]. Other effects are the `pi/ka/pu effect' (for
noise bursts before various vowels).

B. The theory predicts infant performance should resemble adults (true 
after 12 mo), and predicts animals should fail with similar tasks (mostly true).

C. Compatible with interaction of visual and auditory information in
perception  - since both derive from articulatory gestures.  Kuhl and 
Meltzoff (1982) found infant preference to look at faces that match speech 
rather than conflict with speech.


Port's Simple Auditory Theory (like Touch/Tone phone)  

The psychoacoustician's theory -- possibly resembling J. Pastore, 1981.
The simplest model says that speech sounds are each distinctive and
are simply identified by a static template.  There is no mysterious link 
to the gestures of speech production.


1.  Static auditory cues will do the job.

2.  Innate, since, of course, nothing much needs to be learned.
Though listeners presumably get very good at listening to speech.

3.  Should work across species with similar auditory systems.

- - - - - - - - - - - - - - - - - - - - - - - - 
Comments:
A: Some animal experiments support this result. Eg, chinchillas
classify voice-onset time similar to humans (Kuhl and Miller, 1986).

B: Predicts little interaction with visual information since audition
and vision are quite distinct modalities, plus only static cues are
relevant anyway.


__________________________________________________________________

April 26, 2001
RFP