In recent studies we have verified what laymen already understand,
that human speech is easily and naturally spoken in a rhythmical way.
But hearing a rhythmic speaking style and demonstrating rhythm
objectively are quite different things. Our empirical verification
of
rhythmic speech is based on the important realization that vowel
onsets (ie, approximately P-centers) are the most important event
determining perceived speech rhythm (Allen, 1972; Morton et al
1976). That is, when speaking rhythmically, English speakers
(and
very likely speakers of other languages as well) adjust overall timing
so that vowel onsets occur near certain priviledged temporal
locations. This is important since it means that if we measure
vowel
onset locations, we do not need to pay much attention to other aspects
of phonetic events to observe and characterize the rhythm of speech.
Although the term `rhythmic speech' can be used in many vague ways,
we
can define it here as describing speech that exhibits a tendency to
locate prominent acoustic onsets at regular periodic intervals on one
or more time scales. This definition is far more flexible and
amenable to experimental evaluation than traditional descriptions in
terms of, for example, `isochrony' (Abercrombie, 1967; Pike, 1943).
Although we do not yet have developmental data, these effects should
surely be of interest to students of language development. It
seems
that some kind of rhythm is found in children's speech from well
before first words. There is the (cyclical) reduplication of
syllables in babbling and the observation that children differentiate
the prosody of their mother's language from other languages shortly
after birth (Mehler et al, 86; Jusczyk, 1997). The cognitive
skills
to be explored in this paper are ones that children acquire very early
in life, so the adults we used as subjects can be assumed to already
have significant experience in this regard -- even if they may be
largely unaware of their own metrical skills.
Actually there is a third way to repeat this phrase that can be found
if one tries to leave a pause after the end of the phrase before
repeating. Thus, one might say `BUY the boy a CAKE [PAUSE], BUY
the
boy a CAKE [PAUSE], BUY ...'' Again this time it will be discovered
that it is a 3-beat pattern although at a slower tempo. But `cake'
falls on beat 2 (rather than on beat 3 as above) with a musical rest
on the third beat. Production in any one of these patterns is
very
stable and consistent. If one tries to do some other pattern,
it
becomes quite difficult and keeps slipping toward one of the stable
patterns, like 1/3 or 1/2.
The results are shown separately for 8 speakers in Figure 1. About
half of the subjects had music training but the other half did not.
Although the target phase angles for the onset of the final stressed
syllable were distributed uniformly over the interval from 0.20 to
0.80
of the repetition cycle, the speakers actually located their onsets
near only 3 locations in the cycle, 1/3 for all the early phase angle
targets, 1/2 for targets near the middle of the cycle and 2/3 for all
target phases later than about 0.55. Notice, however, that 2
of the 8
speakers could not seem to find the pattern that locates the final
syllable at 2/3 of the cycle. (Neither of these two was a
musician. Aside from this, the musicians and nonmusicians performed
about the same.)
In rather different terms, we have validated with appropriate time
measurements the perceptual experience we had above of the rhythmicity
of speech. Our subjects, and, indeed, all adults, are ready to
exhibit this kind of behavior without special training on a moment's
notice. Of course, rhythmic or periodic speech production can
be
observed, not just in this artificial task, but also when singing or
chanting and or for brief periods during performance of familiar passages
of prose.
Notice that this result demonstrates not merely periodic regularity
in
speech (which, after all, was supplied by our metronomic stimulus),
but that there are NESTED periodic patterns. That is, there are
regular periods on two time scales: one at the repetition cycle rate
and another either 2 or 3 times faster than the first but clearly
phase locked to it. What kind of cognitive mechanism could account
for
these particular timing constraints?
When we find any unmistakably periodic behavior from an organism, one
sensible theory is that something is oscillating to control that
behavior. This normally implies that some interdependent parameters
are being recursively updated (as if by a differential equation) that
leads one parameter to rise and fall complementarily with another
(McAuley and Kidd, 1998; Large and Jones, 199). An oscillating
system can behave according to the equations without our knowing the
degree to which the relevant parameters are mechanical or neural or
cognitive. The mathematics of dynamical systems can still help
us
understand and make testable predictions about its behavior.
We can imagine an oscillator cycling such that every time the function
reaches 1 (= 0) phase, it emits a pulse as in the top panel of Figure
2.
The location of a pulse and the period of the cycle it initiates will
adapt
to the sequence of input pulses (McAuley and Kidd, 1998; Large and
Jones, 1999).
For the case of the nested periodicities, where the faster oscillator
couples
its phase zeros with the pulse of the slower oscillator, a more
complex structure is required with at least 2 coupled oscillators
(Large and Jones , Appendix).
So, our first hypothesis is:
H 1: Musical meter and the
harmonic timing effect are set up
in the nervous system by phase-coupled and frequency-coupled
pulse-generating oscillators.
These oscillators tend to oscillate at different, but integer-ratio
frequencies like 1:2 and 1:3. At these frequencies, if every
second
or every third pulse of the faster oscillator coincides with the pulse
of the slower oscillator, then we have either a 2-beat or 3-beat meter
respectively.
Now we need the second hypothesis. What is the significance of
the
pulse for motor control? One might imagine it to be the moment
of
initiation of a movement. Actually, what is attracted to the pulse
is
the most perceptually salient event -- like the tapping sound of a
finger (rather than, say, finger movement onset) or onset of a vowel
(not the onset of the mouth opening gesture).
H 2: Phase zero of any of
these oscillations attracts
perceptually prominent events
(like vowel onsets or taps of a
finger). The phase
of the internal system is adjusted so that the
perceptually salient event
is synchronous with the oscillator
pulse.
H2 accounts for why vowel onsets (especially stressed ones in the case
of English) or finger taps tend to occur near the pulse of one
oscillator or another, while H1 accounts for the underlying metrical
structure itself.
Given a system of oscillators like this and a rule for locating
attractors, we can propose to represent the multi-oscillator meter
as
a potential function for vowel onsets using the phase of the slowest
oscillator (that is, the phrase repetition cycle) as a time scale.
When the system has two oscillators at frequencies 1 and 2, the
potential function should have attractors at both phi = 0 and phi =
0.5 (just like the potential function of Haken, Kelso and Bunz, 1986;
Kelso, 1995). A useful first hypothesis is that the potential function
is shaped like the sum of two inverted cosines, with one having a
minimum at 0 (= 1) phase and the other having a minimum at both 0 and
0.5, as shown in Figure 3A.
V(x) = - A cos phi (x) - B cos phi (2x)
The relative amplitude of A and B determines the degree to which the
harmonic at 2x creates a stable attractor.
****************
Since we also observe evidence of oscillators at the frequency ratio
1:3, we should similarly postulate a potential function with minima
at
phi = 0, 0.33 and 0.67 - at each location where the harmonic
waveform rises through its phase 0 on the assumption that the two
oscillations are phase locked, as shown in Figure 3B. (Notice
that
Haken, Kelso and Bunz found no evidence of attractors here in their
finger-wagging task.)

Relevance for spontaneous speech? Of course, this display of strong
temporal constraints was found while speakers were doing a very
repetitive speech task. Still, these resonant behaviors probably
still exert some influence on spontaneous speech as well, although
we
would expect the effects to increase under conditions where the speech
text becomes more familiar, such as when it is memorized.
First, since the theory specifies attractors in terms of phase angle,
we expect that at least for moderate changes in rate (that is, changes
in the duration of the repetition cycle), the attractors should be
unaffected in terms of phase but vary in direct proportion with cycle
duration. This was verified, for example, in the Cummins and
Port
study (by varying the A-A tone interval over a range of over 10%) and
in previous experiments.
Second, the attractors should vary in `strength' and their degree of
attractiveness should be observable in the effects of perturbation
on
events near the attractor. That is, given a periodic perturbation
of
the system, any effect should be less prominent when the attractor
is
stronger (that is, when its potential well is deeper or has steeper
sides).
Third, if we represent the attractor structure as a potential function
along the phase positions over the range (0, 1) of the slowest
oscillator, then phase zero should have the strongest attractor and
the attractors created by harmonics of the repetition cycle (at
various integer fractions of the longest cycle) should get weaker as
their frequency increases - just as the harmonics of a plucked string
have amplitudes that decrease as the frequency rises. Thus, an
attractor at 1/2 should be less stable than the attractor at 0 or 1,
and an attractor at 1/4 should be weaker than an attractor at 1/2,
and
so on. Such differences in attractor strength have been found
in an
experiment that compares perturbation of target syllables occurring
at
1/3 of the cycle with the same text materials occurring at 1/4 (Port,
et al, mspt 2002). As predicted, the attractor at 1/3 was stronger
than that at 1/4.
A number of questions arise regarding the acquisition of metrically
constrained speech, beyond the question of when the earliest evidence
is found. Presumably, a meter with 2 coupled oscillators (eg,
with
frequencies f and 2f) appears later than a meter with only one level
of periodicity. Is a 3-beat, waltz-like meter more difficult
than a
2-beat meter? Another important issue is whether children may
be MORE
constrained by metrical constraints than more skilled speakers.
It
seems entirely possible that children may lean on regular metrical
patterns as they learn to produce fluent, multiword utterances.
Thus
we might observe more regular timing in children's spontaneous speech
than in adult speech.
1. The speech rhythms (tendencies to locate beats at periodic
locations) result from (or are constrained by/ timed to be in accord
with) cognitive oscillatory structures. We don't know what may be
oscillating at these rates, but we infer something must be to account
for the phenomena.
2. Some aspects of speech rhythm are universal, and probably arise
early in linguistic development, while others differ between
languages,
3. Even nonperiodic spontaneous speech must be somewhat influenced
by these dynamics, just as the mechanical resonance of, say, a limb
will PARTIALLY account for whatever behavior may be imposed by the
body on the limb is attached to. That is, we would expect to
see some
evidence of the limbs mechanical attractors in its overall behavior.
It may be these resonances that account for intuitions of different
rhythmic types between languages such as those proposed by Abercrombie
and Pike.
4. If languages differ in their characteristic rhythmic behavior
at
the time scale of syllables and phrases, then a phonological grammar
should probably be built on TOP of this sloshy, dynamical timing
system -- one that can easily be set into periodic oscillations in
a
partly language-specific style. This sloshing system creates
attractors for prominent events (such as syllable onsets and stressed
syllable onsets) that appear and disappear like waves. These
temporal
constraints may provide a framework on which to `hang' individual
phonological syllables and segments.
5. It seems likely that this kind of global temporal patterning
could be acquired fairly early on in the process of language
development. I hope that investigation of the developmental phenomena
can get under way soon.
Allen, G. (1972). The location of rhythmic stress beats in
English: An experimental study I. Language and Speech, 15:72--100.
Cummins, F. and Port, R.F. (1998). Rhythmic constraints on stress
timing in English.
Journal of Phonetics, 145--171.
Haken, H., Kelso, J., and Bunz, H. (1985).
A theoretical model of phase transitions in human hand movements.
Biological Cybernetics}, 51:347--356.
Jusczyk, Peter (1997) The Discovery of Spoken Language. MITP.
Kelso, S. (1995). Dynamic Patterns: The Self-Organization of Brain
and Behavior. MIT Press, Cambridge, MA.
Large, E.W. and Jones, M.R. (1999). The dynamics of attending:
How
we track time varying events. Psychological Review, 106:119--159.
McAuley, D. and Kidd, G. (1998). Effects of deviations from temporal
expectations on tempo on discrimination of isochronous tone sequences.
Journal of Experimental Psychology: Human Perception and Performance,
24:1786-1800.
Mehler, J., G. Lambertz, P. Jusczyk and C. Amiel-Tison (1986)
Discrimination de la langue maternelle par le nouveau-ne. Contes Rends
de l'Academie de sciences de Paris 303, 637-640.
Morton, J., Marcus, S., and Frankish, C. (1976). Perceptual centers
(p-centers). Psychological Review, 83:405-408.
Pike, K. L. (1943). Phonetics. University of Michigan Press,
Ann
Arbor.
Port, R. and Leary, A. (2002, in press). Speech Timing and Linguistic
Theory de Boeck University Press.
Port, R.F., Cummins, F., and McAuley, J.D. (1995). Naive time,
temporal patterns and human audition. In Port, R.~F. and van
Gelder,
T., editors, Mind as Motion. MIT Press, Cambridge, MA.
Port, Robert, David Collins, Ken de Jong, Adam Leary
and Deborah Burleson (2002). Temporal attractors in rhythmic
speech. Submitted.
Tajima, K. and Port, R.F. (2002). Speech rhythm in English and
Japanese. In Local, J., editor, Papers in Laboratory Phonology
VI.
Cambridge University Press.
van Gelder, T. and Port, R. (1995). It's about time: Overview
of the
dynamical approach to cognition. In Port, R. and van Gelder,
T.,
editors, Mind as motion: Explorations in the dynamics of cognition},
pp1-43. Bradford Books/MIT Press.