hearing

So far, we have discussed the way sound is transformed when it is converted to neural patterns and `sent' (as they say) to the brain. Naturally, what counts is pattern recognition, and the patterns that matter for speech are primarily those due to moving articulatory gestures over time. There are neural systems that are specialized for recognition of time-varying patterns of a very general nature, and gradually, there develop neural pattern recognizers for the patterns appropriate for the environmental auditory events to which one is repeatedly exposed. This kind of learning seems to be at the level of auditory cerebral cortex in the temporal lobe. This learning is rote and concrete - specified in a space of specific frequencies and temporal intervals.

The image “http://bowland-files.lancs.ac.uk/chimp/langac/LECTURE3/points.jpg” cannot be displayed, because it contains errors.

PRESPEECH AUDITION. There is now a great deal of evidence that children during their first year learn to recognize the important ``sounds'' of their ambient language (Werker and Tees, 1984). The allophone-like spectro-temporal trajectories that occur frequently are recognized in fragments of various sizes. And any sound patterns that occur very infrequently or are novel tend to be ignored, ``missed'', or grouped with near neighbors that do occur frequently (that is, the phenomena justifying `categorical perception,' the `magnet effect' etc.). This clustering process or partial categorization of speech sounds takes place at a time when infants produce no words and recognize only a small number. (And they certainly know nothing about alphabets yet.)

COMPLEX PATTERN AND CATEGORY LEARNING. Speech perception is acquired like other complex auditory (or visual) patterns -- by simply storing a great many rather concrete (that is, close-to-sensory) representations of instances or tokens (or exemplars).

If a very complex but completely unfamiliar and novel auditory pattern is played to an adult, they will be able to recognize

very little about it. But if the pattern is repeated many times, eventually, subjects construct a very detailed, description of the pattern. Charles S. Watson did many experiments using a sequence of 7-10 brief pure tones chosen randomly from a very large set of possible tones. (The example below has 4 tones.) A complete 10-tone pattern was completed in less than a second. Such patterns are completely novel and extremely difficult to recognize. In his many experiments, Watson would typically play some random tone pattern 3 times to a subject and change the frequency of one of the tones between the A presentation and B. Then the subject has to say if X, the third pattern instance, is the same as the first or second (an ABX task). However, if a subject is given the same pattern for many trials, eventually, they learn the details of that pattern and can detect small changes (delta f) in one of the tones. That is, they learn a detailed representation for any pattern given enough opportunities to listen to it.

When we become skilled at speaking our first language, we become effective categorizers of the speech sounds, the words in our language, etc. So, abstract categories of speech sound (like letters, phonemes, syllable-types, etc.) can eventually be learned. A category is a kind of grouping of specific phenomena into classes based on some set of criteria. (Of course, a category is only remotely similar to the kind of symbol token usually assumed to provide the output of the speech perception process.

Conclusion.
My proposal is that speech perception is based on nonspeech, auditory perception. Each child learns a descriptive system of auditory features that probably differs in detail from child to child. This feature system includes both spectral and temporal dimensions. Still, this high-dimensional representation makes it possible to store large numbers of concrete, detailed records of specific utterances. This body of detailed auditory records provide (I think), the database on which linguistic categorization decisions are made.