Notes on Science and Linguistics
Robert F. Port
Dep't Linguistics, Indiana University
November, 2001

A. What is Science and How Does it Work?


As scientists, we all need to keep our eyes on the big picture. Why do we conduct science as we do? - with an emphasis on logic and mathematical mechanisms? Why is there so much talk in science about the importance of data? Of empirical, verifiable facts?

Science consists of attempts to understand complex phenomena we don't understand in terms of simpler phenomena that we do understand (whatever `understanding' means exactly).  The reason mathematics is such an important tool for science and engineering is that it is something we can understand (if we take the time to study it and, eg, work through the proofs) and permits computations that simulate (or predict) phenomena.

To take an example, lets try to understand why, say, in the absence of predation, a population of flies might grow exponentially (that is with a curve of population against time that grows increasingly fast), we can use arithmetic to model this. So `If a population of 10 flies has 12 offspring (increasing by 20 percent) and they can create a new generation every 10 minutes, then we can figure: Minute 00,10 flies; Minute 10,12 flies; Min 20,14.4; Min 30,17.3; ...  As an equation, we can express this effect explicitly. By bringing a numerical method to this odd growth pattern, we can understand how it works.

Similarly, Newton's Laws of Motion, like force = mass * acceleration, are idealized  generalizations about objects (and are `true' only as long as you can ignore friction without causing too much error and as long as you can treat each object as having all its mass located at its center, etc etc).  The world is not the mathematical model. The model is just an idealized conceptualization of whatever the world is like - in terms that we can have DIRECT UNDERSTANDING of. To the degree that the model helps you understand the world - including making practical or experimental predictions about it - it is useful.

But are pieces of the model (eg, the variables employed in the model) necessarily actual `pieces' of the world?  Does the word `mass' describe something that is an actual `thing' in the world? It is not so clear. But it doesn't really matter. All that counts - and the reason we employ this `law' is that we can compute things using it that are veridical in the real world. So, for example, we have methods that would allow us to estimate the approximate force applied to the bedrock under the WTC for the few second during which the building went down.  Any highschool physics student could work out a reasonable estimate if you give him accurate measurements of building weight, height, time, etc.

Similarly, connectionist networks are clever models that can exhibit some of the behaviors of parts of nervous systems.  Noone would claim that the nodes of the model are neurons -- only that the way the model works is sort of like the way neurons work and the global behavior of the model sort of resembles some actions by human nervous systems.  That is modelling; something obscure that we didnt understand has been provided with a simpler `account' in terms we DO understand. The model isn't TRUTH. It hasnt been PROVEN TRUE. It has only provided some insight to those people who know enough about the mathematics of neural networks to find the parallel convincing and satisfying.

So that's the way I see `scientific explanation'. Proof is impossible, only common sense and understanding of the complex in simpler terms.

   But there is another critical issue -- about FACTS. Every scientist agrees that our data must be verifiable - that is, not subject (very much) to personal opinion.  Freud claimed each of us has an Id and Ego etc and that these two compete for control of your daily activities.  But few others could see just the same Id and Ego as he did.  People gradually concluded that these are not facts - not certifiably real constructs.  So, now this idea seems quaint. The Id is a theoretical construct that eventually was dropped from science.

Because of arguments about what counts as real data, psychologists came up with some `rules of evidence'. The main thing is that there is a difference between a theoretical construct (Ego, Id, Competence, anger, force,..) and data.  It is very tempting for theoreticians to see their own theory as necessarily true. But REAL data have physical form - something that can be measured in the world (galvanic skin response, heartrate, acoustic emissions, turning the head to the right, uttering `yes' or `same', or whatever). This is how we keep ourselves honest as we justify our personal theoretical claims.

From this perspective, the difficulty with phonology (and linguistics as a whole) is that it has not faced up to finding and committing to complete reliance on physically measurable effects.  Phonetic transcriptions simply are NOT physical effects.  Universal, bias-free phonetic transcription is only a convenient myth!  It is in principle impossible - as we have seen in readings on differences between languages, language acquisition, the problem of vowels, etc.  Every speaker (in any language) is a complex machine built over time for the special purpose of generating ONLY their own native speech patterns.  This means that phonetic transcription is always biassed by that language-specific machine and is untrustworthy except at a very broad scale -- way too broad to be basing phonological descriptions on. Speakers of different languages will transcribe the same utterance quite differently.

So, what I am saying is that from the point of view of linguistics, to build everything - all of the grammar - on top of phonetic features as your objective empirical data is to build on sand.  Phonetics does not come close to providing the kind of concrete physical basis that linguistics needs in order to justify itself to scientists in other fields.  In the case of syntax, the data that justify answers on theoretical issues, are extremely rarified. It seems that only the investigator him/herself (and sometimes their graduate students) can really evaluate whether example E illustrates property P. If other people in the field cant agree on what the physical data are, then we must distrust them (and our own judgments too).  It's way too easy for us humans to hallucinate!  Or just to be wrong or biassed or stubborn.

Esoteric criteria for what counts as data cannot be allowed. Scientific respectibility depends on some kind of PHYSICAL evidence. For awhile it was thought that mental phenomena could not be studied at all because mental events aren't physical (B. F. Skinner, eg, seems to have believed this).  The big discovery in last century, making a science of psychology possible was that, despite this methodological problem, you could still study cognitive or mental phenomena in animals and people IF you design a `behavioral experiment'.  You simply insist that one or more subjects answer a question, push a button, RESPOND in some way.  This creates a physical trace on paper or an audio tape.  Animals will do this for food and people will do it for money.  This basic trick - almost a gimmick - turns mental phenomena into physical ones.  Experimental psychology, phonetics, speech science, and cognitive neuroscience all base their main results on behavioral experiments.  Linguistics should as well but generally does not.

So why doesn't the linguist's intuitive judgments count as physical data as well? Because its only one person's judgment - and each linguist has a vested interest in certain outcomes over others. But if you can ask your question of 10 speakers, say, and most of them make marks on paper or press a key showing they agree with the experimenter, then fine. That's real data too.  But linguists tend to insist this is quite unnecessary.  Sorry, I don't trust them.

What about phonology?  What are the physical phenomena of phonology? They are F0 measurements, utterance of the word `yes' or `no', even a recorded stop burst.  Their distribution is being explained by the theoretical constructs proposed -- distinctive features, segment types, syllables, perceived pitch accent, perceived pitch contour, etc.  But unfortunately it is usually a phonetic transcription thatcounts as the raw data whose features are explained by the theory of phonology.

So, we finally return to the main theme: Are phonetic segments physical units subject to objective measurement?  Absolutely not - as you have seen.  And there is a huge number of reasons to believe that no straightforward physical and universal measurements will ever define ANY of the basic features linguists rely on (+/- stressed, -voice, H tone, `terminal stress', etc etc).  So, linguistics (and particularly phonology), without really investigating phonetics very closely, has take out a huge LOAN. They assume phonetics will come through for them and make good their loan.  But phonetics cannot.  Phonetic units may be more concrete than some other possible units, but they are NOT physically definable. There is no objective way to verify any transcription.  This is why linguistics has built on a hill of sand.

Linguistics needs a completely new foundation - one based on:

B.   Is Phonetics Abstract or Concrete?

The traditional belief of linguists (and by Chomsky-Halle68 in particular) is that the phonological space of a language is an `abstract structure' since (a) it employs a fairly small number of degrees of freedom (the phonological features) and (b) because each abstract unit (eg, phoneme) has a range of physically and perceptually distinguishable variants. (Similarly criteria could be employed to say that the number `5' is abstracted away from 5 pebbles, 5 children, etc. and a geometric square is abstracted away from square figures drawn with chalk, or made from straight pieces of wood or carved from stone).

Similarly linguists believe that phonetics is `concrete' -- because in contrast with true abstract concepts, providing a precise specification for phonetic elements is (they believe) fairly trivial.  On this view, every phonological form, like the word `cat' /k ae t/, is made of abstract components (the phonological features or other structures). These are abstract because the phonological atoms of /k, ae, t/ have a variety of concrete phonetic implementations (eg,  variant allophones).  The phonetic units (the set of allophones), however, can (they hoped) each be given a precise concrete specification. They assume that a [t] or [th] or [ae] each has a simple physical specification. (If they don't claim this, then they have completely disconnected their theory from the real physical world. Without claiming this they would be abandoning responsibility to phonetic transcriptions.)  Linguistics takes responsibility for phonology down to the level of phonetic transcriptions - the last step before physical specification.

But this assumption by linguists that phonetics will be shown to have a straightforward specification in the physical world turns out to be completely wrong! The motor control problem, for example, shows us that every phonetic type (eg, a particular phonetic V or C or phonetic feature) has a rich and complex range of physiological variants that differ randomly from token to token in motor implementation). Of course, the range of variants is `constrained' in various ways.  But saying EXACTLY what the physical definition of the features is is several orders of magnitude more difficult than they imagined!  Each labial closure can be implemented by each speaker in a vast (nearly infinite) range of different ways.  In a physical sense, given the huge number of different physical degrees of freedom involved, very possibly every production is distinct from every other production even by a single speaker (if you look closely at individual muscle activity patterns)!

The study of other topics like place of articulation, the voicing contrast and the study of vowels, all show in different ways that the phonetic feature or phonetic type is AS CONCRETE AS IT IS GOING TO GET. Yet it is still a spectacularly abstract type both in its physiological implementation and in the wide range of acoustic variants that will be perceived as valid examples.

So where is there any concreteness here? Why is there no authoritative list of just what the apriori set of phonetic units are and how they are defined physically?  (Amazingly phonologists still rely almost entirely on the features system proposed hastily by Halle and Chomsky over 30 years ago (plus couple new features like +/- ATR and pitch accents). With the onset of OT, his same list is relied upon with almost no revision!)

But if physical definition of features is really nonproblematic, then why is speech recognition still not solved? Why is speech recognition subject to so much more error than the Touchtone coding system for phone numbers?  Or the reading and writing of bits in a computer chip?  Touchtone signals and computer bit values are objects with clear physical definitions. Machines that interpret these signals reliably (almost infallibly) are available.  Why ain' t  phonetics like that if the Chomsky-Halle phonetic features have concrete physical definitions?

Since speech recognition is still very hard, we have to look further to see what kind of physical definitions are possible.  It appears to me that phonetic types (ie, particular Cs and Vs, etc in a particular language) DO exist as `categories of speech sound'.  It seems obvious that specific languages DO have different sound categories.  But physical definition of them in either the motor space or in acoustic space requires postulating some fairly `fancy' mechanisms.  Apparently speakers of various languages can learn abstract control patterns (eg, coordinative structures) for speech production and, as listeners, can develop abstract perceptual machines for recognition of these units.  These are custom-made sound categorization systems (eg, Kuhl's `perceptual magnets') that employ the statistical regularities in the speech they hear to carve out useful classifications of sound (into our beloved `phonetic features') FROM WITHIN A SPACE THAT HAS NO CATEGORIES. The space does not even have a fixed, uniform dimensionality (that is, speakers may employ or measure different properties of the physical stimulus - be listening to a different subphonetic space and speakers may implement these categories with partly idiosyncratic articulatory and motor patterns).

I repeat: the standard allophone-based phonetic space is itself as concrete a phonological sound unit as we are going to find. Yet, even these allophone-type units remain very abstract -- if you look close.  And, if you look close, you see that they are incommensurate with each other.  That is, you cannot just add them together to get the `total phonetic space of humankind'. Each language constructs its phonological space in quite different ways using different parameters.

The wrong message was taken from the classical Lisker and Abramson experiments on VOT.  Yes, there are a few broad generalizations about 3 predominant categories.  But if you look closer, you will see that English aspirated stops are consistently different from, say, Korean or Hindi aspirated stops.  They sound different and have different typical ranges of VOT from each other. And the voiceless unaspirated [t] of Japanese is NOT the same as the French unaspirated [t].

So, the way it looks to me is that:

        (a) there is no universal space of phonetic categories (as C-H want),

       (b) nor should we even assume there is a universal set of acoustic phonetic parameters (eg, F1, F2, F0, burst shapes, VOT, etc) that comprises the maximum set that languages can employ in feature or segment definitions. Some languages may still find novel things to `measure' that we have never been heard before!  Nevertheless,

       (c) each individual language or dialect does have a fairly clear set of sound categories (ie, the set of Cs, Vs, pitch accent types, etc. that it employs to help keep words distinct during rapid language use.  Still, we expect there will be points of uncertainty where we cannot say whether 2 sounds are co-allophones or distinct phonemes or how many sound units there are in a stretch of speech or what the cues are for some particular features, etc., etc.
        (d) Since the physical definition of these sound types cannot be done simply and directly, we phoneticians have no choice but to try to discover what kind of mechanism IS able to reliably identify the sound units of any language.

This is why understanding coordinative structures and perceptual mechanisms is important for phoneticians.  This is what leads us toward neural networks, metrical oscillators, etc etc -- all those `fancy mechanisms' of perceptual categorization and category  implementation in production.