Notes on Science and Linguistics
Robert F. Port
Departments of Linguistics and Cognitive Science
Indiana University
September 8, 2007

A. What is Science and How Does it Work?

As scientists, we all need to keep our eyes on the big picture. Why do we conduct science as we do? - with an emphasis on logic and mathematical mechanisms? Why is there so much talk in science about the importance of data? Of empirical, verifiable facts? My concern in this handout/essay is that I think the discipline of linguistics has made assumptions about speech perception -- about the infallibility of the conversion of physical sound into sound categories that noone has bothered to justify and which turn out to be incompatible with the nearly all the data that are implicated by these assumptions.  This essay has evolved from discussions with graduate students defending traditional linguistics.   It is important to see that linguistic theories differ in fundamental ways from the modelling program of most other sciences.

 The issue is `what are the pieces from which words are specified or spelled?'  The traditional assumption in linguistics that there is some fairly small universal set of sound tokens which are all that is relevant to language about speech. It is assumed that if you get these 40 or so binary phonetic features right, then nothing further about the physical sound of speech will be relevant to linguistic matters.  But the evidencesupports no such tokens or feature vectors. The perceptual clarity we feel about the letter-like description of speech is primarily a consequence of our letter-based literacy skills. The first topic to discuss is the making of scientific models and why observable data are so critical to the scientific enterprise. Then we will return to language and linguistics.

1.  EXPLAINING THINGS WITH THEORIES

Science consists of attempts to understand complex phenomena that we don't understand in terms of simpler phenomena that we do understand (whatever `understanding' means exactly).  The reason mathematics is such an important tool for science and engineering is that it is something we can understand (if we take the time to study it and, eg, work through the proofs) and permits computations that simulate (or predict) phenomena.

To take an example, lets try to understand why, say, in the absence of predation, a population of flies might grow exponentially (that is with a curve of population against time that grows increasingly fast).  We can use arithmetic to model this. If a population of 100 flies increases by 20 percent every 2 days, then we can figure: Day 0,100 flies; Day 2,120 flies; Day 4, 144; Day 6,173, ...  As an equation, we can express this effect explicitly. By bringing a numerical method to this growth pattern, we can understand how it works. Eventually the model can be expanded to include a term for the effects of predation (due to a predator population following a similar equation that varies with size of population of the flies).  Of course, the model described makes some assumptions known to be false, such as a new generation all born at once every 2 days, but the model still allows some approximate predictions.

Similarly, Newton's Laws of Motion, like force = mass * acceleration, are idealized  generalizations about objects (and are `true' only as long as you can ignore friction without causing too much error and as long as you can treat each object as having all its mass located at its center, etc).  The world is almost never exactly like the mathematical model. The model is just an idealized conceptualization of the world - in terms that we can have direct understanding of. To the degree that the model helps you understand the phenomena - including making practical or experimental predictions about it - then the model is scientifically useful.

But are components of the model (eg, the variables employed in the model) necessarily actual `pieces' of the world?  Does the word `mass' describe something that is an actual `thing' or true property in the world?   It is not so clear. But for many purposes, it doesn't really matter. What counts -- and the reason we employ this `law' -- is that we can compute things using it that are accurate about the real world. So, for example, using Newton's equations we have methods that would allow us to estimate, for example, the approximate force applied to the bedrock under the WTC for the few seconds during which the building went down.  Any college physics student could work out some kind of reasonable estimate (using a simplified model expressible with mathematics)  if you give him accurate measurements of building weight, height and a constant for gravity.

Similarly, connectionist networks are clever models that can exhibit some of the behaviors of parts of nervous systems.  Noone would claim that the nodes of the model are neurons -- only that the way the model works is sort of like the way neurons work and the global behavior of the model resembles some aspects of the behavior of human nervous systems.  This is modelling.  Something obscure and complex that we didnt understand has been provided with a simpler `account' in terms we do understand -- most often using computational simulations. The model isn't truth. And it hasn't been proven true. It has only provided some insight to those people who know enough about the mathematics of neural networks to find the parallel convincing and satisfying.

So that's the way I see `scientific explanation'. Proof is impossible, only common sense and understanding of the complex in simpler terms.

2. THE DATA
 But there is another critical issue -- about facts. Every scientist agrees that our data must be verifiable - that is, not subject (very much) to personal opinion.  We need to be able to verify anyone's observations.  Freud claimed each of us has an Id and Ego etc and that these two compete for control of your daily activities.  But few others could see just the same Id and Ego as he did.  People gradually concluded that these are not facts - not certifiably real constructs.  So, now this idea seems quaint. The Id is a theoretical construct that was eventually dropped from science.

Because of arguments about what counts as real data, psychologists came up with various `rules of evidence'. The main thing is a distinction between a theoretical construct (Ego, Competence, Noun, +Voice, anger, force, mass, ...) and real data.  It is very tempting for theoreticians to see their own theoretical constructs as necessarily real.  But data have physical form - something that can be measured in the world (galvanic skin response, heartrate, acoustic emissions, turning the head to the right, uttering `yes' or `same', or whatever). The `behaviorists' are the group who sharpened these rules for constraining what counts as evidence. (Notice that Chomsky sees behaviorism as primarily a theory that constrains a theory of intelligence.)  But this is how we keep ourselves honest as we justify our specific theoretical claims.

From this perspective, the problem with most modern approaches to linguistics and with generative phonology in particular is that it has not committed to reliance on physically measurable data.  Put very simply, phonetic transcriptions do not have verifiable physical definitions.   Thte idea that phonetic transcription using a fixed universal alphabet is free of bias is nothing but a convenient myth!  It is, in principle, impossible - as we have seen in the literature on differences between languages, language acquisition, the problem of vowels, etc.  Every speaker (in any language) is a complex machine trained over time for the special purpose of generating only their own native speech patterns. And all professional linguists in the world have also been trained to read and write speech using an alphabet.  This means that any human decisions about a phonetic transcription consist of mere intuitions that biassed in unknown ways by the transcribers speech perception machine.  Transcriptions are completely untrustworthy except at a very broad scale (which is just where phonologists would like to leave it, I realize). But a broad low-bitrate transcription cannot possibly account for the phonetics that speakers control in languages of the world. A very broad transcription is useless and narrow transcription is impossible because speakers of different languages will transcribe the same utterance quite differently.

What I am saying is that linguistics has built all its theories of linguistic knowledge on a foundation of phonetic features as objective empirical data and this is to build on sand.  Phonetics does not come close to providing the kind of concrete physical basis that linguistics needs in order to justify itself to scientists in other fields.  In the case of syntax, the data that are relevant to theoretical issues are extremely rarified. It seems that only the investigator him/herself (and sometimes their graduate students) can really evaluate whether example E illustrates property P. If other people in the field can't agree on what the physical data are, then we must distrust them (and our own judgments too).  It's way too easy for us humans to hallucinate or just to be wrong or biassed or stubborn.

Esoteric criteria for what counts as data cannot be allowed. Scientific respectibility depends on some kind of physical evidence that anyone can look at for themselves. For awhile in the early 20th century it was thought that mental phenomena could not be studied at all because mental events aren't physical (B. F. Skinner, eg, seems to have believed this).  The big discovery in last century, making a science of psychology possible was that, despite this methodological problem, you could still study cognitive or mental phenomena in animals and people IF you design a `behavioral experiment'.  You simply insist that one or more subjects answer a question, push a button or respond in some way.  This creates a physical trace on paper or an audio tape.  Animals will do this for food and people will do it for money.  This basic trick - almost a gimmick - turns mental phenomena into physical ones.  Experimental psychology, phonetics, speech science, and cognitive neuroscience all base their main results on behavioral experiments.  Linguistics should as well but generally does not.

But can't the linguist's intuitive judgments count as physical data as well?  Yes, but it's only one person's judgment - and each linguist has biases and a vested interest in certain outcomes over others. But if you can ask your question of 10 speakers, say, and most of them make marks on paper or press a key showing they agree with the experimenter, then fine. That's real data too.  But linguists tend to insist this is quite unnecessary.  Sorry, I don't trust them and you shouldn't either.

B.   How can Linguistics be Done with Concrete Data?

Finally return to the main theme: Are the segments of phonetic transcriptions physical units subject to objective measurement?  Absolutely not - as you students have seen.  And there is a huge number of reasons to believe that no straightforward physical and universal measurements will ever define ANY of the basic features linguists rely on (+/- stressed, -voice, H tone, `terminal stress', etc etc).  So, conventional linguistics (and particularly phonology), without really investigating phonetics very closely, has take out a huge loan. They assume phonetics will come through for them and make good their loan by providing physical interpretations of discrete phonetic features and segments.  But after almost 50 years it is clear that phonetics cannot.  Phonetic segments may be more concrete than ``phonemes'' and some other possible units, but they are not physically definable.  This is why linguistics has built a huge formal and theoretical superstructure on a foundation of sand.

Linguistics needs a completely new foundation - one based on:

  1. responsibility to physically definable phenomena - measurements, subject  responses, verifiable observations of one sort or another, and
  2. employment of a wide range of mathematical tools in modelling. Look for places where dynamical systems might be applied. Do not restrict yourself to the abstract algebra of discrete symbols or any one branch of mathematics as the Chomskyan movement did.
  3. Speakers do have some way of coding words in memory and use this code for recognition. But this representational code is far richer and more detailed than any that can be provided by 40 or so phonetic features organized into serially ordered vectors..

Think about these problems: Why is there no discussion of the authoritative list of apriori phonetic units and how they are defined physically?  Phonologists seem to think this is someone else's problem, not theirs (because they only have to worry about the ``formal'' properties of language, not where it is that the formal tokens come from).  Why have no phonologists suggested any changes to the  feature system proposed by Halle and Chomsky 40 years ago (except for couple new features like the pitch accents of ToBI).  Is that system is perfect and true?  But how many phonological units does English actually have? No linguist seems to have made a claim about that.  With the onset of optimality theory around 1990, this same list from 1968 was relied upon for constraint definitions with no revision whatever.  But if physical definition of the features is really nonproblematic, then why is speech recognition by computer still not solved? Why is speech recognition subject to so much more error than, say, the Touchtone coding system for telephone numbers?   Now the Touchtone signals and computer bit values are units with clear physical definitions. Machines implementing formal programs interpret these signals reliably (almost infallibly).  If linguistics and phonology are really formal systems, then why don' t  phonetics and phonology work like.

So, the way it looks to me is that:

      (a) there simply is no (finite) universal space of discrete phonetic categories (as Chomsky and Halle require if their linguistics is to be a real empirical science),

      (b) nor can we assume there is a universal set of acoustic phonetic parameters (eg, F1, F2, F0, burst shapes, VOT, etc) that comprise the maximum set that languages can employ in feature or segment definitions. Languages will continue to discover novel things to control that have never been heard before!  Nevertheless,

       (c) each individual language or dialect does have a fairly clear set of sound categories (ie, the set of Cs, Vs, pitch accent types, etc.) that it employs to help keep words distinct during rapid language use and to provide some components that can be recombined to make new vocabulary.  Still, there are many points of uncertainty where we cannot say whether two sounds are co-allophones or distinct phonemes or say how many sound units there are in a stretch of speech or specify what the cues are for some particular features, etc.  This is because the ``phonological units'' are only context-free to a very limited degree.  Indeed they can only  be called ``units'' to a limited degree.
        (d) Since the physical definition of these sound types cannot be done simply and directly and because individual speakers each have their own set of auditory features for specification of words, we phoneticians have no choice but to try to discover what kind of mechanism is able to learn how to identify the sound units of any language.


This is why understanding formal models for coordinative structures, memory and perceptual mechanisms are important for phoneticians.  This is what leads us toward models like neural networks, metrical oscillators, etc -- all those fancy mechanisms of perceptual categorization and category implementation in production. Such models can help us understand the complexities of language. If linguistics is to be real science, it must rely upon real physical data and it must be incorporated into other areas of cognitive science.