A familiar phenomenon for those who work with speech synthesis is that synthetic speech may sound highly intelligible when you know the text content. But if you do not know the words being spoken, it may be sound completely UNintelligible. Then, as soon as you know the text, you cannot understand why you were confused before! Given this, the only reliable evaluations of intelligibility must come from listeners who do not know the words being said in advance.
From this perspective, notice that the first demonstration from Bell Labs in 1939 cleverly has the announcer say the words before you hear the synthetic speech. Other demos use texts that most listeners will know by heart -- eg, a quote from Shakespeare or from `The Night Before Christmas'.
Below is the text content of all the synthetic speech samples in the archive -- leaving, however, a couple small gaps where I was unable to interpret the sample myself. The most likely errors or gaps are in 15 and 27. If you can interpret them or know for sure what was intended, please let me know. (Thanks to Dr.John Logan for help with several of these.
1. Good evening, radio audience. Good afternoon, radio audience.
2. These days a chicken leg is a rare dish. It's easy to tell the depth of a well. Four hours of steady work faced us.
3. What did you say before that? Tea or coffee? What have you done with it?
4. How are you? I love you.
5. Welcome to the Stockholm Speech Communication Seminar.
6. Welcome to the Stockholm Speech Communication Seminar.
7. I enjoy the simple life. He knows just what he wants.
8. I enjoy the simple life, as long as there's plenty of comfort.
9. I am the standard male voice, Perfect Paul. This is the result of trying to imitate a female voice by increasing the pitch, reducing the head size and lengthening the Open Quotient.
10. The leaves had been raked into piles.
11. A, B, C, D, E, F, G,... Oh, how happy we shall be, when we know our ABC.
12. She saw the house. This is a test.
13. Now spell ONE as in `one word'. O, N, E. Correct. Next spell EARTH.
14. Where is Dennis sitting? This field of beets is ripe and ready.
15. I painted this (...) without looking at a spectrogram. Can you understand it?
16. To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles and, by opposing, end them.
17. A bird in the hand is worth two in the bush. It was the last thing I expected to find there. Did you come by motorcar? I'm going home now.
18. The number you dialed, ME-1-5280, has been changed. The new number is BA-6-1347. This is a recording.
19. This a computer vocal tract speaking. You are listening to the voice of a machine. The number you have reached, 464-1078, has been changed.
20. You are listening to speech synthesized by rule. It was made at Haskins Laboratories with the computer. On the computer typewriter, the experimenter types a phonemic transcription. Control signals are calculated for the synthesizer and the synthetic speech can be heard immediately.
21. 'Twas the night before Christmas when all through the house/ Not a creature was stirring, not even a mouse./ The stockings were hung by the chimney with care,/ In hopes that Saint Nicholas soon would be there.
22. This paper describes a small realtime speech synthesizer. The synthesizer requires as its input a string of phonemes and the associated duration, pitch and amplitude parameters.
23. Hello. I am a language interpreter named LINGUA. I have been used to synthesize speech from demisyllables by rule. I start by breaking your sentences up into the chunks for demisyllables and then combining them.
24. Once upon a time, there lived a king and queen who had no children. Not a day passed that the queen did not say `If only we had a child!' One day, as the queen was walking beside the river, a little fish lifted its head out of the water.
25. I can read stories and speak them aloud. I do not understand what the words mean when I read them, but I can guess which words are important and which words are not by rules I have been given.
26. Animals talk to each other, of course, there can be no question about that. But I suppose there are very few people who can understand them. I never knew but one man who could. I knew he could, however, because he told me so himself.
27. Hello, I am the Kurzweil Reading Machine. Welcome to Mid-Manhattan Library. I have been placed here (?to help) any blind or visually handicapped person. Please enter a command.
28. The juice of lemons makes a find punch. A box was thrown beside a parked truck. (Thanks to John Logan for the first sentence! RP)
29. The birch canoe slid on the loose planks. Glue the sheets to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish.
30. Speech is so familiar a feature of daily life, that we rarely pause to define it. It seems as natural to man as watching and only less so than breathing. Yet it needs but a moment's reflection to convince us that this naturalness of speech is but an illusory feeling.
31. The SA-101 is built around a 16-bit microcomputer, the MC-68 thousand, and a signal processor, the MAC 77-20. The device can easily be connected to a normal computer terminal.
32. Four hours of steady work faced us. A `large' size in stockings is hard to sell. The boy was there when the sun rose. A rod is used to catch pink salmon.
33. Text-to-speech systems are beginning to be applied in many ways, including aids for the handicapped, medical aids and teaching devices. The first kind of aid to be considered is a talking aid for the vocally handicapped. According to the American Speech and Hearing Association there are over one million people in the United States who are unable to speak for one reason or another.
34. This paper will give a brief overview of recent text-to-speech work at Bell Laboratories. Starting about a year ago, we have completed a new set of computer programs that translate English text into sound. This system constructs speech sounds by concatenating elements from an inventory of about 900 units, stored in terms of multipulse LPC coding.
35A. I am Perfect Paul, the standard male voice.
35B. I am Beautiful Betty, the standard female voice. Some people think I sound a bit like a man.
35C. I am Huge Harry, a very large person with a deep voice. I can serve as an authority figure.
35D. My name is Kit the Kid and I am about 10 years old. Do I sound alike a boy or a girl?
35E. I am Whispering Wendy and have a very breathy voice quality. Can you understand me even though I am whispering?
36. The following is a list of topics in todays news. In the sports world, the Red Sox lost to Detroit. First round matches were played in the Wimbledon Tennis Tournament. Arnold Palmer won the Seniors Gold Tournament in Latrobe, Pennsylvania. In local news, there was a five-alarm fire in Cambridge.