Why Apriori Phonetic Transcription is Not Possible

Is There a Universal Phonetic Space?:
Why Apriori Phonetic Transcription is Not Possible
Robert F. Port
Department of Linguistics
Indiana University
draft November 14, 2002

I would appreciate any comments or suggestions on this incomplete argument sketch - especially appreciated would be examples of data that I have not considered here and any counterarguments that might be raised against anything here. I plan to expand these notes into a full paper sometime soon. R. Port

A. One assumption of generative linguistics is that it is possible, in principle, to produce a correct phonetic transcription of any language known or unknown. This follows from the existence of a Universal Phonetic Alphabet of segments or segmental features from which the sound systems of all languages are constructed. This feature system is available innately to every human infant for representation of heard speech sounds. Every property that cannot be captured by a segmental description in terms of phonetic features reflects implementation rules that are universal and unalterable. This claim, due to Chomsky and Halle, 1968, is extremely strong but seems to be accepted by nearly all working phonologists and linguistic theorists, at least in the United States (Kenstowitz and Kisseberth, 198n; Sloat, Taylor and Hoard, Pinker, etc).   All phonological work that compares directly across languages in stating and comparing ``universals of phonology'', including those of current optimality theory, base their comparisons on this assumption. But it seems very unlikely to be true for many reasons that have been observed in the phonetics and psychology of language literature through recent decades.
Before beginning the discussion of the phonetic space, it is important to make clear that none of the criticisms below are objections to claims of either discreteness or featural structure of the phonologies of languages. First, it seems quite clear to me that all languages employ a rather small set of contrastive speech sounds (for English, about 10 vowels and 20 some consonants) from which at least most of the vocabulary could be said to be constructed. (Although it also seems to me that most languages have some areas where the phonological description is indeterminate - where it isn't clear what the `spelling' should be. Examples in English include the second consonant in words like spit. Is it /p/ or /b/? And what is the vowel in beer? Is it the same as in beet or in bit? etc.) Secondly, it appears that in many cases the sounds of languages can be appropriately described as combinations of feature-like properties that characterize a number of segmental contrasts. (Good examples are the place feature shared by /b, p, m/ and the voicing contrast shared by b/p, d/t, z/s, v/f, etc.) Whether all the segments of a language (and all vocabulary items) can be exhaustively described in terms of distinctive features is a more problematic issue, however (and to me, doubtful). Third, there is no reason to reject the possibility of abstract sound units, that is, units whose phonetic implementation may vary from context to context. Languages employ abstract, `invisible' structures in many places and such things may exist in phonology as well. One example is a `tensity' feature for obstruents in a language like English - a `unit' that is implemented with various combinations of durational and glottal gestures in different contexts. Apriori limitations on what phonological units can be like should be avoided.
So the issue addressed in this essay is whether the discreteness discussed above can be accounted for by the discreteness of phonetic features, as claimed by generative phonology. Does phonology inherit its discreteness from a discrete universal phonetics, as claimed by Chomsky and Halle? Or does phonology need to come up with its own account of phonological discreteness?   I endorse the second alternative: it is a theoretical problem for phonology (and for theoretical psychology) to account for how discreteness come to exist in the production and perception of adult speakers of human languages. However it happens, it does not come, I claim, from an innate discrete phonetics.
B. First we should review the main arguments that have been put forth to support the claim of a fixed, apriori inventory of possible types of speech sounds.

Rapid language acquisition. The first argument is based on the rapid acquisition of vocabulary by children. Basically a version of the `poverty of the stimulus' argument, it is that children seem to be able to listen to productions of their caretakers and recognize when, say, the word cookie is repeated and, eventually to say cookie themselves. This implies the ability to represent each production - before they seem to have had time to develop an appropriate phonetic code - so that the identity across repetitions can be observed, despite perhaps hearing the word produced at different speaking rates by their mother, by the father and even their older sister .   This remarkable feat of invariant perception could be accounted for if the child had an apriori method for phonetic representation that was invariant across speakers, speaking rates and other possible sources of variation (Chomsky-Halle, 1968, p. 4-6, p. 298 ff).

Universals of phonetic inventories. The second argument concerns the widespread occurrence of a certain sound types across languages. A majority languages exhibit the vowels [i, e, a, o, u] and often schwa and consonants like [b, d, t, k, n, m, s, z] etc (Ladefoged and Maddieson, 198n; Greenberg, et al). If there is nothing universal about phonetic sounds, why is that languages don't come up with radically different phonological systems with little to no identity between them?

Auditory limitations. It is clear that the auditory system has limits on its ability to resolve frequency and duration. It seems logical that these should ultimately impose, in effect, a grid on, say, the space of vowels and consonants. So sensory limitations and even limits on motor control should somehow impose a limit on the size of the set of potential phonetic distinctions. It seems the phonetic space could not possibly be infinite in size.

Quantal properties of speech acoustics. Stevens (1972) pointed out that the acoustic theory of speech production predicts that some salient properties of speech acoustics should be less sensitive to articulatory variation in certain regions than in other regions of the articulatory space. He reasoned that such locations (e.g., alveolar and velar consonant place and certain vowels) support claims that these sounds are defined by universal phonetic features.

Intuitions of discreteness in speech sounds. Finally, although it is difficult to find explicit statement of this argument, it seems likely that the intuition of linguists (like that of lay observers) is that speech sounds just sound discretely different. Looking at a continuous color spectrum, we tend to see the series violet, blue, green, yellow, orange, red, similarly, when English speakers produce a continuous series of vowel colors, they seem to yield the discrete vowels in beat, bit, bet, bat, bottle, etc. This impression of the discrete nature of distinctions like +/- nasality, +/- voice, +/- sibilant or fricative, is very strong and provides further support for the notion that speech sounds simply come in discrete types.

Unfortunately most of these theoretical arguments are not well thought through while others, however plausible they might be, are empirically incorrect. However, before attacking these points explicitly, it will be perhaps most useful to simply review the many kinds of empirical counterevidence that, notwithstanding the above arguments, speech sounds simply are NOT drawn from a discrete universal set despite the widespread practice of phonologists to use the same inventory of symbols to represent them across languages.
C. Here are some of the empirical problems with the hypothesis of a segment-based Universal Phonetic Alphabet with references to some of the relevant phonetic literature:
1. Speech timing. Segments are known to differ in duration by various amounts, yet phonetic alphabets (which supposedly embody everything about speech sounds that is relevant to language) represent time only in segment-sized units. The Chomsky-Halle approach to dealing with these durational phenomena is to propose universal temporal implementation rules (not part of the universal phonetic inventory itself) that make, for example, low vowels longer than high vowels and voiceless obstruents longer than voiced obstruents. Other widespread observations, such as longer vowels preceding voiced obstruents are accounted for with (universal) context-sensitive temporal implementation rules. Since the rules are universal, it is no problem that they deal with time in subsegmental ways. Presumably they are to be considered an aspect of linguistic performance rather than linguistic competence.
However, many temporal differences are not universal. To give just a single example, Arabic, but no other language observed so far, has shorter vowels following voiceless stops but not shorter preceding ones (Port, Alani and Maeda, 1979). Thus many important language-specific differences are not representable at all with any alphabet-like transcription nor can any universal `phonetic implementation rules' account for these differences. Thus these temporal patterns are neither segmental features (since they are relational) nor implementational universals. Such properties should not exist according to the Chomsky-Halle theory.

Mora timing in Japanese - a temporal sound unit that is specific to this language (so it cannot be a `temporal universal') (Port, O'Dell and Dalby, 198x; Homma, 1995).

Languages differ in their characteristic temporal patterns in vowel duration and voice onset time, etc.(Port, Alani and Maeda, 1979; etc).

2.   There are no perfect phoneticians - among either adults or infants.   No adult listener can come close to correctly recognizing all phonetic sounds in all languages. Nor has anyone ever claimed that they could -- nor that anyone else could. This includes such famous experts in impressionistic transcription as Daniel Jones and Kenneth Pike. In fact, some sound contrasts of some languages seem to be practically impossible for speakers of certain other languages to perceive reliably without an amount of training greater than has yet been provided to any individual as far as we know. Phonetic ear training does not have any way to overcome these difficulties reliably. The notion, widespread in linguistics departments a few decades ago, that a one or two semester course in phonetics for linguistics graduate students can produce reliable practicioners of `universal phonetics' is completely naive. (Even Peter Ladefoged admits that it was many years after his PhD that he finally was able to produce something as widespread as glottallized consonants!)
Of course, Chomsky and Halle did not claim that all adults (or even any adults) are able to do this, but only that all prelinguistic infants can (and that they use this ability to represent the words their hear in their ambient language). It is not impossible that after some sort of critical period, people lose the ability to recognise many speech sounds. In fact, experimental research over the past 15-20 years has clarified many of the basic facts of the acquisition of language-specific speech sounds (see, e.g., Strange, 1995, for a review). These results have sometimes been misinterpreted as support for the Chomsky-Halle view so these results should be looked at with some care.
Briefly, prelinguistics infants (in the first year of life) can discriminate many sound differences in many languages. But (1) it is not known that they can discriminate all of them, and (2) discrimination (detecting a difference) is quite different from and much easier than identifying or categorizing a stimulus. But the ability to provide a psychological phonetic transcription of the kind envisioned by Chomsky and Halle requires (a) division of the speech signal into segment-sized units and (b) assignment of each segment to a certain phone type or to several feature categories. There is no evidence that infants can do this before they are able to recognize words.
By about a year (when they begin producing their first words), they have already lost the ability to discriminate differences between many sounds in languages that are not part of their environment. This loss of discriminability may result in the inability to correctly classify (or discriminate) some speech sounds in some foreign languages - an ability that can sometimes be extremely difficult to restore. (The best researched example of this is the difficulty with English /r/ and /l/ among Korean and Japanese speakers. Surely other examples will turn up in time.)

R vs. L for Japanese and Korean speakers have proven extremely resistant to learn for adults. (Logan, Lively and Pisoni, 1991, Yamada and Tohkura, 1992). It has been shown that optimal discrimination of the R/L contrast by native speakers depends on a complex combination of at least 3 acoustic parameters (Polka & Strange, 1985).

Some place of articulation distinctions not in one's native language have be shown to be very nearly indistinguishible for most adults (See Strange, 1995, for a review).

English speakers find it very difficult not to hear stress differences in languages that have none -like Japanese and Korean.   That is, linguists as well as lay people have a tendency to `hallucinate' distinctions from their own language in the speech of other languages.

Much evidence now shows that infants within the first year can discriminate many speech sound differences that adults and children of a year or more in age cannot discriminate (Werker and Tees, 198x). Apparently children learn not to pay attention to many aspects of speech sound as they acquire their L1 (Cf. the `perceptual magnet' effect, Kuhl and Iverson.) But these discrimination abilities offer no support to the Chomsky-Halle claim of innate categorization of speech sounds into distinct apriori phonetic features and segments.

`Incomplete neutralization.' Some phonetic differences are consistently produced distinctly yet are completely imperceptible (or marginally perceptible) even to native speakers. Examples are the ``neutralization'' of final voiced and voiceless stops in German, Polish, Catalan, etc and flapped D and T in American English (Port and Crawford, 198n; see Manaster-Ramer and Port letters to the editor in J. Phon 24 (1996) for review). It seems impossible to give a simple answer to the question whether these sounds are the `same' or `different': native speakers will say the sound the same, yet they have differing distributions on acoustic variables and, in a forced-choice task, the same listeners perform well above chance. Such a situation should be quite inconceivable if there were a universal phonetic alphabet. On this view, authoritative transcription should always be possible.

All of these phenomena imply that linguists' judgments about phonetic transcription simply cannot be trusted. Phoneticians all have some native language and their ability to classify sounds that are exotic for them should not be trusted. Only experimental methods can verify whether some sound in two different languages are the same or not (see discussion between Manaster-Ramer and Port in J. Phonetics 24.)
3. Huge set size (at best). Some of the difficulties raised above might be dealt with in the traditional view by simply enlarging the alphabet. Perhaps the incomplete neutralization phenomenon and small cross-language differences in, e.g., vowel duration or VOT, just show that sound types can be much closer together than had been thought. Perhaps children are born with many more sound categories than had been imagined. Of course, it is hard to argue decisively against such a possibility. But such a tack by defenders of the traditional view comes at great cost. After all, if one can always just say ``Oh, well, I guess you have shown that we need to add another feature (or segment) to the universal inventory'', then one can always find an escape from any bit of evidence that the complete set has not yet been found. But it is clear that the set of possible phonetic differences is, at the very least, extremely large. If there is a finite number of them, then that number must be many thousands or millions.
But at this point, the theory of a Universal Phonetic Alphabet has abandoned making any substantive claim at all. There is no longer any way to falsify it. But worse than this is that such a size is too large a number to be of any use to a language learner. If different speech sounds can be so similar to each other, then in general, repeated productions of the same word will tend to be ``transcribed'' in different ways by the infant. The main theoretical problems solved by the idea of an innate sound inventory (that is, an account for rapid acquisition of vocabulary and an account of why it is that different languages seem to employ the same sound types so often) disappears. Rapid acquisition and the frequent observance of certain sounds selected from an inventory of millions of sounds that are perceptually almost indistinguishable must remain huge mysteries despite the postulated Universal Phonetic Alphabet! Children cannot use it to represent different productions of the same word. And frequent speech sounds across languages, like [s] and [a], will tend not to be the same any more but to have different transcriptions! So, yes, one can try to defend the universal alphabet idea by greatly expanding the inventory, but doing so (a) makes the theory effectively unfalsifiable, and (b) makes it useless to account for crosslinguistic commonalisities and rapid language acquisition.
4. Articulatory-acoustic anomalies. Another problem for the notion of a universal alphabet of units specified in both articulatory and acoustic terms is that some speech sounds are unique only in articulatory terms but have complete overlap acoustically with another segment type. Such cases arise only where there is a prosodic speech gesture extending over several segments, for example, in cases of vowel harmony. In this situation, two vowels may have identical acoustic properties yet be articulatorily completely distinct (and belong to different harmony classes). Only word-length (or multisyllabic) units will differentiate them. This situation should be impossible if careful phonetic transcription is all that is required to make correct phonological descriptions. Two attested examples are in Luo and Azerbaijani.

ATR in Luo (Leon Jakobson `Vowel Harmony in Dholuo', unpub 1978 disstn, UCLA Dept of Ling.). Here the articulatory difference between two [u]-like vowels that are +ATR and -ATR was verified with x-ray images of a native speaker but the formant values overlap almost perfectly. Since most words contain more than one vowel and the ATR harmony property extends over whole words, listeners are rarely in doubt about which /u/-like vowel they are hearing. Thus speakers maintain a consistent articulatory difference in tongue root position despite complete acoustic overlap.

Lip rounding in Azerbaijani (Fred Householder, `Vowel overlap in Azerbaijani' in A. Valdman (ed), Papers ...in the Memory of Pierre Delattre (Mouton, 1972). Two high mid vowels sound the same perceptually yet differ in lip rounding. Again vowel harmony based on lip rounding avoids perceptual confusion for speakers (since other vowels in the same word differentiate the phonologically distinct vowels).

5. Much practice needed. Some phonetic features require a great deal of practice to produce or perceive (both for adult second-language learners and for first-language learners as well). (See Strange, 1995.) If the phonetic alphabet is innate, the acquisition of all sounds should be simultaneous. However some sounds take years for native speakers to produce reliably.
6. Gradual sound change. Another major problem with the notion of universal apriori discrete phonetic features is that, although some sound changes do seem to take place as relatively discrete jumps, many others appear to occur as gradual continuous changes of, eg, vowel target position (W. Labov). This implication of the Chomsky-Halle position on universal phonetics was explored early on (refs?) with inconclusive results at the time. But continued detailed studies of sound change have verified that some sound changes take place in a continuous articulatory or acoustic space, not a discrete one -- unless, of course, one claims (unfalsifiably) that there are a great many tiny but discrete steps that are just too small to have been noticed. Of course, another way to defend the Chomsky-Halle view is by assuming that despite large discrete steps of segmental description, it is the `phonetic implementation rules' that are continuous. The obvious response to that, is that such continuous implementation rules could no longer be universal, so there is still a major problem.
7. Only linguists believe in a universal phonetic inventory. Finally, linguists might care to take note of the fact that very few phoneticians, including Peter Ladefoged and the developers of the IPA alphabet, would endorse the claim of Chomsky-Halle that there is a fixed, apriori, small set of segments (that is, under a thousand or so) or features (under a hundred or two) from which all human languages must select. (K. N. Stevens may be one of the rare exceptions.) It is almost exclusively linguists (in particular, generative phonologists) who believe the set is fixed and reasonably small in size. Yet these are just the scientists who have an investment in the theory of cognition as uniformly symbolic and in the sharp distinction between competence (as discrete and mental) and performance (as continuous and physical).
Clearly, if human linguistic competence is in fact a formal system, like an algebra or like logic, then it must, like algebra and arithmetic, begin with an apriori inventory of discrete symbol types. In my view, it is actually this fundamental assumption about language that provides the primary motivation for the unquestioning committment to the assumption of a universal phonetic alphabet in generative phonology.
D. None of these arguments mean that a phonetic alphabet is not useful and practical for use by linguists for many purposes Nor are these reasons not to employ a discrete phonetic alphabet for academic communication about language and speech. I support the goals of the International Phonetic Association to provide a standard alphabet for academic communication.
E. Critique of the traditional arguments for innate discrete phonetics.
It is time to return to the arguments offered to support the assumption of a universal phonetic alphabet and to criticise each of them explicitly.

As for argument 1, rapid language acquisition, it is really an argument of the form: `since no other explanation exists, this must be true'. This is never a terribly strong argument by itself since it is nearly impossible to show that no alternative could be found. In fact, other explanations can be found, supported by empirical evidence and mathematical modelling. Of course, it is surely true that human children are born with cognitive mechanisms for learning and representation that are sufficient to allow them to acquire the phonetics/phonology of their native language rather quickly -- so the innate ability to acquire language is universal to that extent. But an innate discrete universal phonetic alphabet containing all possible sounds of languages of the world is certainly not the only hypothesis supported by observations of rapid language acquisition.

Argument 2 about universals of phonetic inventories has an alternative account. The common speech sounds may simply be those that are (a) relatively easy to pronounce and (b) relatively easy to articulate (as suggested by Martinet, Ohala, Lindblom and others). It reflects the ways in which different languages have discovered similar solutions to the problem of differentiating a large vocabulary using the human apparatus for speech and hearing. The extreme vowels, [i, a. u] are articulatorily and acoustically extreme and thus maximally distinctive acoustically. Since the human vocal tract happens to `support' the production of the highly distinctive sound [s], it is not surprising that many languages have discovered this possibility and found a way to incorporate it into vocabulary.

Argument 3 about auditory limitations is not coherent. Of course there are auditory limitations that prevent distinctions being too close together along any auditory dimension. But these do not give you discrete categories. At one time psychologists similarly believed that auditory limitations implied `units' like the `just-noticeable-difference', but the fallacy here was uncovered long ago in psychophysics. JNDs are now understood as fictions that may be convenient for communicating with laymen, but the observance of just-noticeable-differences does not imply the existence of any units along a sensory dimension - even if such limitations do influence the spacing of any categories they may employ that lie along the frequency scale.

Argument 4 about the quantal properties of speech is very interesting but what it shows is not that there must, for this reason, be universal distinctive features. Instead, it shows that innate features are not necessary to account for why languages often discover the same places of articulation and vowel categories. In other words, any quantal properties of speech provide a direct explanation for sound preferences in languages.   Distinctive features that are innate are no longer required to account for them -- the opposite interpretation from Stevens'!

As for argument 5 about phonologists intuitions of discreteness, this cannot be denied. People do have intuitions about the phonological categories of their language -- and they find it difficult or impossible not to employ these discrete categories when they listen to speech in other languages. But this is accounted for by our very early acquisition of our own phonologies and is, in fact, very weak and misleading evidence about phonetics as a whole.

F. Does it really matter? Why not just say `Oops, apparently Chomsky and Halle were wrong about that'' and just carry on? The problem is that if there is no discrete, universal, phonetic alphabet, then many things follow that have consequences for work on phonological theory.

There is no reliable basis for comparing any phonetic or phonological properties between languages. There can be, for example, no universal phonological hierarchies, no universal markedness, no universal optimality constraints, etc. The entire enterprise of phonology will have to be rethought. We can no longer try to determine what is apriori and universal about phonologies by directly comparing our phonetic transcriptions of, say, Japanese or Spanish with our transcriptions of English. The feature we call [+/- voice] may be fundamentally different in the languages - they may be incommensurable even if they do exhibit certain similarities. It means that one will have be very careful about drawing cross-language generalizations of any kind.

There is no apriori innate phonetic bootstrap to account for rapid language acquisition. How words are learned so quickly by children will have to await a less implausible theoretical approach. Phonology must instead begin with a phonetics that is continuous and nondiscrete (that is, with a space that is essentially infinite in size) and largely learned. Thus the problem of rapid language acquisition remains to be solved by other theoretical approaches -- most of which lie outside linguistics proper.   For some useful contributions to this issue, see Guenther and X, 2001; Perkell, Guenther, et al, 2001; Grossberg and Myers, 2002; etc. The facile assumption of universal phonetic categorization, no matter how convenient for phonologists, is completely unjustifiable. In order to make use of new approaches, some retraining will be necessary since a variety of mathematical techniques will be called for that linguists rarely receive training in.

Some other explanation must be sought for the obvious perceptual discreteness of the phonologies of languages as spoken by native speaking adults. Traditionally, the phonology did not have to worry about the origin of discreteness since phonetics is already discrete. If the argument of this essay is correct, phonology must face up to an account of where discreteness comes from.   In fact, there are many cases where the approximate discreteness of phonologies apparently breaks down (e.g., in the cases of incomplete neutralization and during the process of phonological language change). Still, there is little reason to question that within any language or dialect, most of the sound types are discretely distinct from most other sound types. What appears to have happened is that the intuition of discreteness felt by linguists for sound types within a language has been projected (inappropriately) onto the universal set of sound types.

The approximate universals of sound systems (e.g., the wide distribution of certain consonant and vowel types, the symmetries of sound inventories, the frequency of certain phonological processes such as palatazation) require accounts in terms of communicative effectiveness and ease of articulation (along the lines of Martinet's Économie des Changement Phonetiques and Lindblom and Liljienkrantz. 198n.)

Unfortunately, modern phonology has built itself entirely on a foundational myth - what can only be called the Myth of the Universal Phonetic Space. This proposed space provides a vocabulary of theoretical terms that allows direct comparison between the sound systems of different languages. Although this idea may have been inchoate in early descriptions of the International Phonetic Alphabet and may have been vaguely believed in by linguists in the first half of the 20th century, it was formalized and made fully explicit for the first time in Chapter 7 of The Sound Pattern of English (1968). It was not an idea whose falsity was necessarily obvious at that time. However, over the past 35 years, few phonologists have been concerned to explore the degree to which rapidly accruing experimental evidence rendered this idea more plausible or less plausible, and few phoneticians (who are generally familiar with the phenomena reported here) have apparently dared to challenge Chomskyan linguistic orthodoxy.
It is time for phonology to abandon this myth and seek a new basis for theoretical progress. It will be wrenching to begin this process since dramatically different theoretical skills will be required. The result of the process, however, will be that phonology will employ many of the same theoretical tools as other disciplines (calculus, dynamical systems theory, statistics, experimental methods, and so on). And linguistics will rejoin the other cognitive and behavioral sciences. No doubt linguistics will be welcomed by them.

R. Port, port@indiana.edu