Now Hear This!
How the computer learned to speakBY PATRICK BASS, ANTIC ST PROGRAM EDITOR
Most of us are already used to hearing a machine talk. However, 20 or 30 years ago the very notion that a machine could talk was met with disbelief by the general public. There were talking machines, but if you closely exammed most of them you would probably find a wire running behind a curtain where someone would stand like the Wizard of Oz and speak into a microphone. Typical "mechanical men" in the 1938-39 New York Worlds Fair were built this way.
Linguists had long realized they could break human speech down into distinct, separate parts, each composed of a single sound. These individual parts of speech are called "phonemes" (FO-NEEMS). There are roughly 64 different phonemes in human speech, and by using just phonemes we may reproduce nearly any language spoken on Earth.
Before World War II, people had started building electronic circuits which could produce individual phonemes. These early voice synthesizers were operated by a keyboard much like one found on a piano. Each key would produce a different phoneme when pressed. Skilled key-boardists could actually make the phoneme machine "talk".
People didn't think of the machine as really talking, however, because they could see a live human being pressing keys to make the noise. The voice quality wasn't anything to write home about, either. For a machine to talk by itself, it needed to press its own speech keys correctly. And that required the one component which didn't exist at the time-a computer.
During the 1950s and early 1960s computers were big, clanking machines which were fed data by men wearing white coats. They were inaccessible to most people, and still held a certain mystique. People tended to believe that computers were all-powerful and, with the proper instructions, could do anything.
However, Hollywood had different ideas. A room-sized computer was impressive, but for dramatic purposes, impractical. In the mid-'50s they dressed up a computer with arms, legs, a plastic bubble on top, and a voice. Robbie the Robot co-starred in the movie "Forbidden Planet" and influenced movie and television robots for years to come.
But the public still considered these robots as fancy suits with small people inside them. In the late 1960s, however, a movie came out which completely changed how people view talking computers.
It's hard now to remember the impact the film" 2001: A Space Odyssey" had when first released. You can get a glimmer of the feeling when you realize" 2001" was made before high-tech moviemaking came along, but is still considered the "standard" space-effects movie to beat. Before "2001" came along, talking machines, like Robots, were thought of as good-natured friends with machinelike speech. HAL changed all that.
You could only see small parts of HAL. It was integrated throughout the spacecraft with the only visible parts- outside the computer room-being its TV "eye." A human actor actually spoke HAL's words, but the public was fascinated by the idea of a disembodied computer conversing in a natural-sounding voice as it denied its plot to kill the crew. People left the theater believing that computers would soon be speaking as well as HAL, if not better.
In the early 1970s, Votrax, a division of the Federal Screw Works, was told to build and implement a computer that would talk electronically. Votrax built many different talking machines for the Government and eventually marketed a talking computer board that could be slipped into home computers. While Votrax laid most of the groundwork for computer speech, Texas Instruments was busy putting the 100-plus individual parts of the Votrax Speech Synthesizer on a single integrated circut.
Towards the tail end of 1970s, Texas Instruments introduced their talking chip set in a children's toy, the TI Speak & Spell. The computer would say a word and challenge the child to spell it, using the built-in keyboard. This was probably the first consumer product offering built-in computer-generated speech. About the same time, Votrax introduced a complete line of plug-in speech boxes for almost any computer, most notably the Tandy TRS-80.
Today, there are many different talking appliances. We have Coke machines and games and automobiles that talk. Soon, nearly every appliance will have a voice.
There are two different approaches to speech synthesis, each with its own advantages and disadvantages. The first type gives almost completely natural sounding speech. In fact, almost any sound can be reproduced with it. To do it, the computer is literally turned into a digital tape recorder. Sound is digitally sampled in real time, and stored in RAM memory.
To play the sound, the RAM is read back and stuffed out a speaker as fast as it was sampled. While this method can produce remarkable high-fidelity sound, sampling is horribly memory-intensive. For example, the Covox Voice Master for the Atari 8-bit uses-typically-32K to store about 15 seconds of sampled sound.
The second way to reproduce speech is to use hardware to remember how to produce the different phonemes (remember them?) needed for human speech. With this method, the parts of speech are already programmed into the speech chip. All we have to do is tell the chip to play them back in the proper order.
Here, as little as five bytes of memory can produce more than one second of phoneme speech. And today's computerized phoneme speech usually sounds quite understandable. However, it's a somewhat mechanical sound, not nearly as natural as sampled speech.