Comments on Grossberg and Dynamical Models

Here is the note on Grossberg and the next paper, vanGelder-Port that I wanted to send you yesterday but didnt finish. These are just quick notes to get some ideas down.

R. Port April 15, 1999

TVG-PORT AND GROSSBERG

What Tim and I tried to do in our paper and in the book is to get people to stop thinking about formal, symbolic descriptions as the necessary apriori assumption as soon as you see something that looks like intelligence or that looks discrete. Continuous dynamics is NOT an apriori implausible or impossible framework - to be rule out after the first observations about language and reason (the way Chomsky would argue, and maybe Fodor too). Grossberg shows us how a dynamical, continuous time system can learn to chop up the speech signal into discrete parts. Selforganization processes, or unsupervised learning, can discover ways to chop sound up into the discrete pieces - to break it up into the units that exhibit the greatest degree of statistical independence. Thus the speech stream can begin as (be presented to the infant as) just a continuous stream and yet, by operations of the young brain of an appropriate listener (ie, a humanoid), this can be (eventually) chopped into discrete pieces at several hierarchical levels at once - in a way so that each layer physically overlaps the others in time (at the sensory periphery) and yet is discrete and serially ordered at its own level. Such a system needs methods to delay decisions, but prediction is taking place at several time scales at once (from semantics at the longest scale all the way to acoustic spectral slices at the shortest time scale). For Grossberg (and for me), the discreteness property and the simultaneous hierarchical structure is taken as something that requires and explanation - that is, roughly, that it needs a MECHANISM -- not a mere assumption. So G. proposes models of specific processes and demonstrates that observed qualitative properties can be simulated.

LINGUISTIC COGNITION

So as came up in today's discussion, there is a whole series of models that are similar in function though very different in mechanism - models for the linguistic structure of speech. Here is the list again with further comments.

1. The Structural Linguistic Model. Linguists assume an autosegmental symbolic hierarchy - (Phon'l units - Morphemes - Words - Phrases - Ss). Pick your favorite theory of phonological structure; they are all formal, symbolic, static, etc.
2. The Blackboard Model (from Hearsay2) of cross-level integration of information basically implements the linguistic hierarchy in AI terms! You have Cues - Phonemes - Morphemes - Words, Phrases, Intonation Contours. The idea was ``Messages (ie, tentative hypotheses) get posted on a blackboard for all to see.'' (If it came along today, we might call it the `Post-it' theory of cognition.) But the problem with messages is that they still have to be INTERPRETED - by somebody or something. What vocabulary is the useful one? Tough question and how does one layer make intelligent interpretation of the output of all the neighboring layers?
3. Rumelhart-McClelland (1981) Word Recognition Model was designed to account for the ``word-superiority effect'. The model is a continuous-time (conceptually at least) layered, neural network of excitatory and inhibitory units. Parameters all tweaked by hand. Stimulus info excites upward to appropriate letters - then words, which compete to find themselves in the stimulus (so they inhibit each other in proportion to their own activation). R-McC's answer to the vocabulary issue is, just to say `Get more/less active'. That is the only information that is needed - excitation or inhibition - nothing more.
4. Grossberg's models of stacked layers of ART systems has very similar behavior to the first 3 systems above - what a surprise! It finds discrete units on at least several of these levels (and can be expected to do more, the way things are going). But this model can learn over a long enough time scale to interpret the stimulus in practical ways, that is, as discrete units on various levels (that is, time scales). So it does not require (at least not conceptually) that the hand-design method required for all the symbol-like units (the phonemes, morphemes, words, etc) in the first 3 models above. In principle, it appears, his systems can build up units on all the required levels just from exposure. Of course, G hasn't actually built a system that learns English yet ;-), so his arguments are `in principle' ones, at this point. But just to be able to suggest a plausible path by which a system with a very simple initial structure (that is, that follows equations like the ones he presents on various temporal and spatial scales AND which can employ learning rules along the lines of those he suggests) could learn to really HEAR language - to analyze it perceptually on all those levels simultaneously - that's a really important accomplishment, I think. And one that is important for linguistics.

So, being a dynamicist doesn't mean that one ignores facts nor even ignores intuitions about the discrete units of language. It means, I think, that the discipline of linguistics cannot be conducted without a model along the lines of what Grossberg lays out. The older, Chomskyan view just ASSUMES symbols (The mantra: `Language is formal, Language is formal, Language is formal,.....). But also, it turns out, they assumed mechanisms of symbol recognition (roughly, template matching) and generation (`just make it and put it there'). They thought these were not problematic assumptions, but they ARE. So, rather than have `symbols all the way down' (from brain to physical world to brain. in an event of linguistic communication), the dynamic approach has symbols only in brains - and at an abstract level too. Only here is there a discrete categorization of physical stimuli.

ART PHONOLOGY?

So let's assume Grossberg's ART model puts us in the ballpark for a theory of the neurological basis of language perception and production (rather than the feature-matching model assumed in traditional linguistics for the form of speech perception). This would then provide the theoretical framework for studying the phonology of particular languages. In that case, the issues of phonology might become questions like:

(1) Are the "phonological masking fields" of all languages the same? Or do different languages construct such fields differently in terms of the number of competing or independent hierachic levels, etc? How much commonality is there here across languages? For example, does language L have an intonation masking field? Is there some universal standard by which to chop pitch contours into `parts'? Or does intonation have a different kind of masking field? Some of the syllable-hierarchy work in recent phonology might be reinterpreted as claims about masking-field structure.
(2) What particular acoustic parameters can be measured from the signal to serve as the sensory features for various phoneme-like items? ( No discrete phonetic apriori units need to be assumed, assume only `something that can be measured from the signal'. Not a +/- Feature, but a dV/dt, where V is some continuous variable.) Some likely salient possible time points are vowel-onsets, stop-burst spectra, F1xF2, opening-closing of the VT, and various intervals between these events.

This is the new analog of phonetics. The old phonetics has either turned into phonology (if it shows discreteness or is language-specific or structured), or else into some open-ended inventory of likely variables V for individual speakers to employ when listening to or producing speech in their native (or nonnative) language. So the new phonetics looks for the variables whose dV/dt provide the cues for perceptual mechanisms and the target specificiations for speech production. (These may vary over time and between speaker/listeners).

Well, this is radical stuff, I guess. We will see how it looks as time goes by. See you soon.

Bob