Note on Linguistic Symbols: Deacon to Seidenberg

Note on Linguistic Symbols: from Deacon to Seidenberg
Robert Port
-not for quotation or forwarding please
Sat, April 3, 1999

I wanted to try to write down and expand on the comment I tried to make at 3M on Tues and in my L645 class Thurs morning - about how Deacon's notion of linguistic symbols seems on the right track but is still incomplete. In particular, he does not provide us an explanation of what he means by `symbol-symbol relationships' in general. On the other hand actually, there are some language scientists who have been developing ideas that seem quite compatible with Deacon's who do have an account of what these symbol-symbol relationships would be like and how they might be learnable. My purpose is to try to pull these ideas together and suggest that they give us (or at least give me) a new take on what you have learned when you learn words. This began as an email message, but got too long to stuff into everyone's mailbox.

Terrence Deacon.

In Chapter 3 of The Symbolic Species: Coevolution of Language and Brain (1997, Norton), Deacon tries to explain what Symbols are that makes them so different from Indices and Icons (p. 79 ff). The architypal symbol is, of course, a word in any human language. To make his point, he describes some results from a chimpanzee language-teaching experiment by Savage-Rumbaugh and Rumbaugh - where chimps learn to display arbitrary plastic tokens for `words' (that they call lexigrams) and construct `sentences'. Though his explanation is not the clearest, it seems that the experimenters taught the animals a pair of verbs and a set of nouns for foods. The idea was that when using the language to ask for solid foods (eg, piece of bread, slice of banana), their artificial grammar used a certain lexigram (lets call it `DROP') but for liquid foods (like orange juice, or apple juice) used another token (call it `SQUIRT'). In English, we might gloss the sentences as DROP BANANA and SQUIRT ORANGE-JUICE (meaning ``make the machine give me this''). It took lots and lots of trials for them to learn these. It turns out that if you just train them with sentences like SQUIRT ORANGE-JUICE and DROP BANANA-SLICE, they have no trouble, but they really are treating both SQUIRT and DROP as essentially synonymous terms meaning `release'. Thus they clearly do NOT appreciate that these verbs have specific meanings and are appropriate only for certain nouns. In fact, one essential part of the training is that they tricked the chimps into producing ungrammatical productions like `SQUIRT BANANA' and `DROP ORANGE-JUICE' which were then not reinforced. Eventually, the chimps appreciated that SQUIRT is only for liquids and DROP only for solids. Then when they introduced new nouns (eg, COOKIE or MILK), the trained chimps were able to use them IMMEDIATELY with the appropriate verb. On the other hand, chimps that had not been through this elaborate training program employing both negative and positive examples could not get this idea at all. They continued to use SQUIRT and DROP as simply equivalent to `GIMME'. And to fix this would require extensive new training on the new noun with the two verbs.

What Deacon draws from this example is that these two chimps were able to learn the ``system of logical relationships between the lexigrams - relationships of exclusion and inclusion'' (p. 86) - although only with massive amounts of training (whereas for human brains, such relationships are easily and naturally acquired).

``These lexigram-lexigram relationships formed a complete system in which each allowable or forbidden cooccurrence of lexigrams in the same string (and therefore each allowable or forbidden substitution of one lexigram for another) was defined. They had discovered that the relationships that a lexigram has to an object is A FUNCTION OF the relationship it has to OTHER LEXIGRAMS, not just a function of the correlated appearance of both lexigram and object. This is the essence of a symbolic relationship.'' (p. 86)

``No individual lexigram determines its own reference. Reference emerges from the hierarchic relationship between these two levels of indexicality, and by virtue of recognizing an abstract correspondence between the system of relationships between objects and the system of relationships between the lexigrams. In a sense it is the recognition of an iconic relationship between the two systems of indices. . . . . This makes a new kind of generalization possible: logical or categorical generalization, as opposed to stimulus generalization or learning set generalization. ... The system of lexigram-lexigram interrelationships is a source of implicit knowledge about how novel lexigrams must be incorporated into the system. Adding a new food lexigram, then, does not require the chimp to learn the correlative association of lexigram to object from scratch each time. The referential relationship is no longer solely (or mainly) a function of lexigram-food cooccurrence, but has become a function of the relationship that this new lexigram shares with the existing system of other lexigrams, and these offer a quite limited set of ways to integrate new items...... (Thus) lexigrams need no longer be treated as indices of food availability.

The words have content derived from common patterning with other words. Thus they can break free of describing only the world itself. We can use words (eventually) to model the world, tell lies or generate wonderful imaginings.

More basically, if these trained animals were presented with a new lexigram for, say, `porridge'- something they had never experienced - they would still, as soon as they were exposed to the word in context, obtain some degree of knowledge about the content of the word -- when they observe SQUIRT or DROP used with the new word. Thus, by merely by detecting or choosing which group of other nouns this one groups with (bread, banana vs. juice, water) the symbol itself embodies some abstract content, that is, a part of its meaning. This way words get some abstract meaning just from their statistics - the statistics of cooccurrence of their tokened forms (which often means their phonological specification or visual spelling specification). The statistics are not just binary, but include complex collocational patterns (derived from the actual spatiotemporal distribution of realworld events) - whatever can be learned. By `embodying the content' I mean that if the phonological code for a word can excite the phonological code of related (that is, associated) words (so banana should very slightly excite apple and cookie), then all the common semantic properties of the members of these token classes can also be slightly activated (since the phonological codes excite content in other areas like the visual space. So linguistic symbols work by using the statistics of word collocation to parcel out categories like Things, Events, People, etc, from the world itself. That's what I think may be being proposed here.

This seems like a really important idea. But Deacon doesn't seem to come back to it later, after he discusses all those other topics: neural architecture, human brain specializations, the physical evolution of our species, etc. In Chapters 8 and 9 he discusses how human neural specialization affects language, but not in terms of the sign-sign relationships. But it surely must be the case that what he is claiming is that our hugely enlarged prefrontal cortex and its specialized connections with other parts of cortex and brainstem somehow (among other things) support the learning of all those word-word relationships that specify groups and subgroups of related concepts. But he does not get around to developing any other specific examples of such knowledge and how it might support learning the grammar of a language. Nor does he come back to show how the anatomical specializations of human brains he discusses later in the book (novel cortical-cortical connections, etc) relate to the acquisition of the symbol-symbol associations pointed to in Chapter 3.

But if you think about what this knowledge must be like, it would have to come from learning statistical relationships of cooccurence between words (or lexigrams in the chimp experiments). But what could be the basis of the learning? What did these chimps actually learn that was special? It seems like the chimps start by learning SHAPE-A goes with SHAPE B, SHAPE C, etc. And children would have to learn, eg, that the word DRINK goes with WATER, MILK, BEER, etc (the various specific liquid-representing nouns). But that knowledge would be about the word `drink', that is, about the sequence of sounds /d-r-i-n-k/ (not the concept or semantics or something). What is learned about word-word relationships must be anchored in a substantive space of TOKENS of some kind, it seems to me. Quite simply, there must be something whose statistics could be learned. But what?

I can't imagine what those tokens would be if not they don't include especially SPEECH SOUNDS - phonemes or distinctive features or something drawn from a (probably discrete) closed space of the speech sounds of language L - something concrete enough that we could learn their statistics (even though it's not the sounds themselves that are what's really important about the statistics, of course). The result of learning all these statistics might be a hierarchical grouping and subgrouping of words into classes (eg, parts of speech and word subcategories in elaborate detail) each exhibiting various `cooccurence constraints' (as linguists like to put it). That would be the way to understand language according to Deacon, I think. But, alas, he does not return to this issue and make these ideas clear for us. That's why I would say that the book, even though very exciting and important, is flawed and too difficult to read.

Mark Seidenberg.

Given these concerns, I was please to come across the recent Science article by Mark Seidenberg (thanks to Winston Goh). The article is ``Language acquisition and use: Learning and applying probabilistic constraints'' Science 275 (1997), 1599-1603. Seidenberg argues that children's acquisition of language should not be looked at in the usual Chomskyan framework involving apriori abstract categories like Proper-Noun, Intransitive-Verb, Preposition, etc (and maybe even Subject, Predicate). Instead, he claims, children learn statistical relationships between words. They learn that certain words tend to cooccur with certain frequencies and in particular constructions with certain frequencies. This statistical knowledge supports very rapid interpretation of novel utterances (so that, eg, unlikely interpretations of lexical items are suppressed and only the most probable interpretations are considered). In this way, interpretations of novel sentences seem effortless for us.

For example, during the hearing or reading of a sentence like `The plane left for the East Coast', a possible interpretation of the word PLANE from geometry or from woodworking is suppressed as soon as the word LEFT is encountered (for which its various noun and adjective interpretations are also suppressed). It is suppressed because the most common use of PLANE together with LEFT is some variant of the interpretation `AIRPLANE LEAVE-PAST'. All the other possible crazy interpretations (like about `the flat socialists') fail to survive the mutual competition. What makes the right interpretation `come to mind'or emerge is (just) the statistics of the cooccurrence of PLANE, LEFT, FOR, EAST, COAST, etc which is exploited in a constraint satisfaction network. All those false leads are excited too, but only the most `plausible' ( that is, `statistically likely given the context') pop up.

Now, first, Seidenberg claims that children can be shown to be sensitive to many of these kind of statistics. (This seems to need better research support at this point, as S acknowledges). Second, he claims that at least most of the subclasses of English vocabulary and their cooccurence restrictions can be captured with just such statistics when extracted by an appropriate system. And third, he suggests that modern neural networks designs in fact can learn and exploit just such statistical relationships. Thus this research group can work toward explicit neural net models to account for such acquisition. Finally, the neural network can be exploited either for interpretation or for utterance composition.

``The newer approach attempts to explain language in terms of how it is acquired and used rather than an idealized competence grammar. The idea is not merely that a competence grammar needs to incorporate statistical and probabilistic information; rather it is that the nature of language is determined by how it is acquired and used, and therefore needs to be explained in terms of these functions and the brain mechanisms that support them. Such performance theories are not merely the competence theory plus some additional assumptions about acquisition and processing. These approaches begin with different goals and end up with different explanations for why languages have the properties that they have.'' (p. 1601)

Conclusion.

So, the connectionist developmentalists, like Seidenberg and the people he cites, seem to be taking a stand that is quite compatible with Deacon - though they do not seem to be aware of each other. The similarity is that in both cases it is assumed that mechanisms exist to learn the ways that individual words copredict each other. These dumb, token-based mechanisms are important because they are able to bootstrap this system to abstract, world-relevant, intentional cognition. They have semantic effects while retaining some independence of the world itself.

I would appreciate any responses to these comments. Thanks.

Bob