Sections Technology

Word Up

As Speech-recognition Technology Improves, More Applications Emerge

SpeechRecogSpeech-recognition technology, which instantly translates human speech into a digital document or command, has been around in some form for about two decades. But constant improvements in performance — as well as a broading of its applications — have users excited about the future.

That performance is typically measured in accuracy and speed, but various factors have complicated the former, from the vocabulary size of the software to the rate of speech; from accented or disjointed speech to background noise.

Dragon NaturallySpeaking, produced by software developer Nuance, has long been considered the gold standard in minimizing such issues.

“At first, speech-recognition packages were more like frustrating toys with maddening limitations, but they have steadily improved over time,” writes Lamont Wood in Computer World, in a discussion about NaturallySpeaking 12, the newest Dragon product. He said the utility of speech recognition didn’t outweigh its limitations until about a decade ago, but even then, speech recognition was more reliable with long words than with short ones, misinterpreted words were often rendered as commands, and the software occasionally got confused to the point that it stopped listening.

With version 12, he notes, “these factors have faded into the background (although they they haven’t entirely disappeared). For example, you can dictate effectively at about half the speed of an auctioneer — should you prove able to do so. Assuming that you stay focused while dictating, the error rate is now trivial.”

That’s important for people who use speech recognition in a variety of fields, including:

Healthcare. The technology speeds up the transcription process by allowing a medical professional to dictate into a speech-recognition engine and cleaned up by an editor on the back end.
Military. Speech recognition has been tested successfully in fighter aircraft, with applications including setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons-release parameters, and controlling flight display.
Air-traffic control. Many air-traffic-control training systems require a person to act as a pilot and dialogue with the trainee. Speech recognition could potentially eliminate the need for that pseudo-pilot, thus reducing training and support personnel.
Aerospace. NASA’s Mars Polar Lander used speech recognition in some applications.

Other uses are common as well, including court reporting; assistive devices for automobiles, such as OnStar and Ford Sync; hands-free computing; robotics; video captioning for television; and interactive video games — just to name a few.

Taming the Dragon?

Dragon isn’t the only player in the field, however. “Simpler or less expensive (if not quite as powerful) options are carving out little fiefdoms,” writes Mark O’Neill in PC World. “The more choices, the better, too, given that using voice commands can stave off or reduce repetitive strain injuries. The spoken word also suits some projects better than typing.”

Among the lesser-known options are:

Windows Speech Recognition, which arrives preinstalled with newer versions of Windows. “Performance could stand some improvement,” O’Neill notes. “I found the accuracy level dipped when I dictated long texts into a MS Office doc. Nor did it respond well to my German accent, so other accents may stymie it as well.”
Google Voice Search, which works on a Google Chrome browser, which is “fairly good at recognizing what you said.”
TalkTyper, an online app with far fewer features than Dragon. “Even when I spoke clearly, it tripped up on some of the words, and I wasn’t exactly dictating rocket science. TalkTyper should be used only for simpler stuff, shorter spoken content — maybe an email or a tweet here and there.”
Tazti, an app that goes beyond simple transcription. “Rather than taking dictation, Tazti takes orders. It helps you control games, open apps, and even use the command line,” O’Neill notes. “However, Tazti’s one big drawback is it won’t let you dictate text to a document. It’s not that kind of voice recognition.”

Using voice recognition for commands is increasingly common in automobiles. Although these systems are largely user-friendly, drivers still have to rely on set commands when summoning a phone number or searching through music. But Nuance says systems that recognize true natural language with 95% accuracy are probably no more than three years away.

“I believe the biggest gains to be made are going to be in conversational speech and understanding the intent of what the user is trying to accomplish,” Brian Radloff, the company’s director of Automotive Solution Architecture, told Satellite Radio Playground. “We’re starting to see that in telephony in the mobile space.”

He said strides will come when car makers treat their infotainment systems more holistically, with screen graphics properly tying into speech control. “The bulk of the focus over the next five years in the automotive space, and in voice in general, is going to be, how do we take this experience that is very good for a certain group, and make it very good for a large swath of the car-buying public?”

Meanwhile, Wells Fargo recently began testing voice-recognition technology that banking customers can use to check their spending habits and account level. In addition, U.S. Bank has been testing the technology among its employees, and some insurance companies, including Geico and USAA, have incorporated voice recognition in their applications, according to the Charlotte Observer.

Shirley Inscoe, a senior analyst with Aite Group, a national research and advisory firm, said such advances are closely tied to the rise in mobile devices and consumers demanding to do more with them. “There’s a big desire to improve customer service. They know we as consumers don’t go anywhere without our mobile phones. It really is a way to tie a customer more closely to the financial institution.”

Other advances in voice recognition go well beyond finance and leisure activities. For instance, two MIT students recently spent their winter break in New Jersey developing a device that could give paralyzed people the ability to call for help with the sound of their voice or change the settings on their wheelchair when no one is around. They were inspired by retired physics professor Michael Ogg, who has multiple sclerosis.

“My real limitation now is because of MS. I’m completely quadriplegic. I’m just not able to move my arms and legs at all,” Ogg told the Asbury Park Press.

He relies on home health aides for daily assistance, but when he is alone, he cannot reach an alarm by his bed to summon aid. “In the case of … being able to call for help,” he said, “this is potentially life-saving technology.”

Speak Clearly

Whichever voice-recognition software one uses, Wood offers a few tips to make the technology easier and more effective, including enunciating carefully and speaking slowly enough so that each word gets its due; watching the results on the screen as you go along, which can enhance accuracy; and taking heed of background sounds.

“Background silence is best, but droning ventilators hurt recognition more than office chatter,” he writes. “Meanwhile, if you don’t mind being overheard on the phone, then you won’t mind being overheard while dictating. You can use about the same volume for the phone and for speech recognition.”

Put that way, the ever-improving realm of speech recognition can be thought of as just another office function, as it’s increasingly assimilated into many corners of the world, from gaming to aviation to healthcare — a life enhancer for some, but for others, potentially a life-saving development.


— Joseph Bednar