Speech Cycling

We ran an experiment in which subjects heard a succession of high and low tones, like beep..boop... beep..boop... played over headphones. Subjects were asked to repeat a phrase like Beg for a duck along with the beep patterns such that beg was aligned with the high tone (the beep) and duck with the target phase, low tone (or boop). The first example here is an easy one to do since the boop occurs exactly halfway between the beeps. The second example has the boop at 0.4. Because we could control exactly where the low tone fell in the beep-to-beep cycle, we were effectively asking them to place a beat at an arbitrary phase of the phase repetitions cycle. You can try this task yourself by clicking on one of the buttons to the right. You will hear a sample stimulus. Try and align beg and duck with the high and low tones, respectively.

Metronome Pattern, Easy

easy-stimulus.au easy-stimulus.wav

Metronome Pattern, Hard

hard-stimulus.au hard-stimulus.wav
In the panel below you can see the result from one typical subject. This subject took part in 90 trials (where each trial is a whole series of productions of the phrase produced on a single exhaling breath). On each trial a target phase was randomly selected from the range 0.3 to 0.7, so that any number in that range was equally likely to occur. For each trial we measured the average phase produced, and the distribution of these 90 values is shown. Now the targets were selected so that any number between 0.3 and 0.7 was equally likely to occur. If the subjects could do just what we told them, then all the columns in the figure should be about the same height. Instead, you can see that certain phases are much more likely to be produced than others. The three peaks in the histogram correspond to three distinct patterns which are regularly produced. Any other pattern is unstable and dispreferred. These three patterns are illustrated below using musical notation (the sample phrase here is Beg for a dime).

Sample histogram of observed phases

sample histogram
Although the target phases were distributed evenly across the X axis (the phase axis), subjects produced beats only in three regions along the phase axis, near 1/2, 1/2 and 2/3. This is remarkable and implies a very severe constraint either on their ability to perceive these phase lags, or to produce them.
This result, which we have replicated several times, suggests that the laws of harmonic timing -- the same laws that make strings oscillate at a fundamental frequency and only at integral multiples of that frequency. Thus, there may be three preferred (apparently GREATLY preferred) phase angles for the second stress syllable of these phrases. Those three are easily expressed with musical notation as shown at right. Yes, you might have expected that our subjects, several of whom were highly skilled musicians (from the IU School of Music), but no matter what target phase we gave them (between 0.3 and 0.7), they gave us only these 3 phases as apparent targets.

The Three Preferred Patterns

musical representations
So the beats that correspond to syllable onsets cannot fall anywhere (at least if people repeat a phrase over and over), but at temporal locations suggested by a simple harmonic model with preferences for simple fractions. We have demonstrated objectively that, given a few typical constraints, there is almost unavoidable rhythm in speech. Next page... Next page..........

Copyright Status

Copyright information is specified for each item where relevant. The webpage as a whole is copyrighted by Indiana University.

Robert Port, port@indiana.edu