Saturday 22 October 2011

Giving a Voice to Technology


Before herds of animals were scrawled in charcoal on cave walls or hieroglyphs were pressed into clay with a stylus, humans relied on speech to communicate. As communication technologies have advanced, speech has become less involved. Text messages and e-mail bear the task of transmitting thoughts but haven’t always been sympathetic listeners. The incorporation of Siri into the Apple iPhone 4S might herald devices that don’t turn a deaf ear to their owner’s needs.

 Siri certainly passes scientist Alan Turing’s dictum that true artificial intelligence must “do well in the imitation game,” meaning that any answers it gives should be as natural as a human’s. The personal assistant has been humoring those testing its capabilities—and its patience—answering the inevitable questions like “Will you marry me?” with “My End User Licensing Agreement does not cover marriage. My apologies.”

 Many technologies have acted as stepping stones to get to this point of interaction. The turning point came when entire thoughts, not just words, could be understood by machines. It’s a reverse of how people changed their own understanding of the world through reading: Ancient civilizations wrote in scriptio continua, a style void of spacing and punctuation. Texts eventually adopted breaks and periods to parse words more easily.

 Since then, writing and speech have become more closely linked. Now, machines and devices are joining the game, able to listen, comprehend, and react.

Radio Rex (1922)

Man’s best friend was man’s first application of speech recognition. A celluloid bulldog toy, Radio Rex jumped at the call of his name, at least some of the time. Rex leapt to attention at 500 Hz, roughly the same frequency that making the “eh” sound generates, providing the power needed to release him from the electromagnetically controlled spring that otherwise kept him kenneled.



Voder (1939) and Vocoder (1940)

The musical world owes Homer Dudley a solid. His creations share vocal credit on countless songs, like Daft Punk’s “Harder Better Faster Stronger,” but don’t get a cent in royalties. Working at Bell Labs, Dudley created the vocoder and the voder, synthesizing human speech. The voder provides the means to create a voice, while the vocoder analyzes and breaks down speech down into components that can be recombined. Bell Labs produced a play-at-home version, the Speech Synthesis kit, that let people tinker with reproducing vowel sounds.



Speech Understanding Research (1970s)

Grasping one or two words is the natural first step in babies learning to speak and it was the same with machines. DARPA put in the time—and the money—to grow their ability. Its Speech Understanding Research project taught a thousand words to machines and brought them up to understand 90 percent of continuously spoken speech.



Speak & Spell (1978)

Speak & Spell did more good than just helping children of the late 70s and early 80s master the spelling of 200 or so words. The Texas Instruments team that built the toy contributed greatly to making digitally synthesized sound more realistically human. Richard Wiggins, who worked on the single-chip voice synthesizer (the first of its kind), said the speech had to be good enough that the user could understand the word out of context—harder than using a word in a sentence. The toy is most popular now for what it doesn’t say. The familiar primary-colored plastic boxes get wired up and sometimes repainted in what’s known as circuit bending, repurposing the Speak & Spell to make musical instruments that produce the eerie digital effect known as ghosting. Beck used one on some songs on his pixel-covered album “The Information.”



Dragon Systems (1982)

The founding couple of Dragon Systems, James and Janet Baker, worked on DARPA projects in their time at Carnegie Mellon and set out to popularize speech recognition. They met their goal of creating software that could recognize and transcribe continuous speech in 1982 when they founded Dragon Systems. But it came at a price; namely, $9,000 for their first consumer product, DragonDictate, released in 1990. By 1997, the price had come down enough for Dragon NaturallySpeaking to bring speech recognition to the masses. The company was sold twice over and hit some troubled times, but Dragon survived the fire and is now owned by Nuance.



Apricot Computers (1984)

Apple wasn’t the only drupe-named computer company in the 1980s. Apricot Computers was founded as Applied Computer Techniques in the 60s but switched to the friendlier, fruitier Apricot in the 80s. Its Apricot Portable came with Dragon-powered voice recognition that just required a little training. Users could create a file of just over 4,000 words and repeat them into the microphone next to the screen to get the system used to their voice.



PlainTalk (1993)

The name Siri might mean “beautiful victory” but the product is not the first time Apple has tried to compete in speech recognition. In 1990, it hired a slew of researchers to come up with a command-oriented speech interface. Three years later, PlainTalk was introduced as a custom install on PowerPC Macintoshes and AV 68k. Alerted by a hot key, a computer would listen for a command and could even respond with speech synthesis. PlainTalk didn’t just talk, it could sing, too, making appearances on dozens of songs, including Radiohead’s “Fitter Happier” from its album “OK Computer”—incidentally, one of the commands PlainTalk recognizes.



Alternative Augmentative Communication Apps (2011)

Autism-related and other language disorders can rob many of not just a voice, but the ability to communicate altogether. Alternative augmentative communications (AAC) exist, but cost thousands and are often beyond the reach of those who need them. The iPad and other tablets have made these tools drastically more accessible and affordable. Apps like ProLoQuo2Go, Tap To Talk, and Voice4U don’t instantly transform communications, but with work from families and professionals, some are finally able to express themselves.



Siri (2011)

Apple is no novice when it comes to taking a product that existed and creating an entirely new market out of it. That might be what happens with its personal assistant Siri, which existed in the App Store before it was unveiled as part of the iPhone 4S. It also has plenty of lookalikes (like Vlingo) in the Android Market. Siri’s interface is rather plain, but what it lacks in looks, it makes up for in personality. Given that Siri operates thanks to the A5 processor that’s present in the iPad, it might next make an appearance on the device. Though, if you go back to the future, it already has.



Vocre (2011)

While most speech recognition revolves around speaking to machines, Vocre focuses on speaking through it. The app, technology’s equivalent of “the oddest thing in the universe,” the Babel fish of The Hitchhiker’s Guide to the Galaxy, provides instant translation. And the way it works is almost as magical: Select the language and gender of each speaker, talk to the app while the phone is vertical and then flip the phone horizontally. The phone’s accelerometer cues the app to translate your speech. Then flip the device toward the other person to continue the other half of the conversation. The speech-to-text conversion is handled by Nuance, the hybrid translation is from the makers of the app, MyLanguage, and iSpeech gives voice to both parties. The app has launched in nine languages with more to follow.



Thought Helmets (Soon)

Thought recognition is the next frontier. Recognizing that speech can be a danger on the battlefield, DARPA is working on thought helmets that will pick up and translate brain waves from soldiers and transmit them as audible radio messages. While it sounds like a technology that might be a long way off, the program has already had some success. Deeplocal has deployed the technology in a concept bike helmet for Toyota that lets riders control gear shifting by thought.


0 comments:

Post a Comment