Have you ever called a company to get some help or pay your bill, only to be greeted by a pleasant recorded voice that wants to have a conversation with you – but can’t understand half of what you’re saying? Or maybe you own an iPhone, and while Siri first seemed like a good ally, you’ve come to realize that sometimes (OK, let’s be honest, often) she just doesn’t get it? Voice recognition technology (VRT), also known as speech-to-text, falls into a common trap: it has the potential to be incredibly cool (and boy, are we rooting for it), but more often, it’s a teeth-grinding exercise in frustration.
Once an idea that belonged in the realm of science fiction, voice recognition has grown from its infancy in the 1950s, when Bell Laboratories Audrey system was designed to recognize digits spoken in a single voice, to the modern network of conversational electronics we now interact with on a daily basis – with mixed results.
To Speak to a Human, Please Press 0
Many of today’s businesses now use systems called interactive voice response (IVR) to handle customer service calls. The most common use is for voice-navigated menus, but some companies use IVR systems that can access customer account information and answer minor questions. Menu IVR software usually has a limited vocabulary, which may be restricted to "yes," "no" and numbers. More complicated systems can recognize company-specific words and phrases.
These systems are becoming more popular – at least for businesses – for a simple reason: they’re cost-effective. According to a 2010 report by the Wall Street Journal, a typical customer call that reaches an agent costs between $3 and $9, while a call handled through an automated system only costs five to seven cents. And, of course, computer programs don’t get tired, call in sick, or become frustrated with customers (although customers certainly become frustrated with them!).
Fortunately, this doesn’t always mean IVR takes jobs away from people – or at least that all people are disappearing from call centers. These voice-activated helpers allow human customer service reps to be more productive by directing calls and answering simple questions.
Of course for the human users who interact with these technologies, it’s not always smooth sailing. Technology is helping to improve on common problems in IVR technology, such as trouble with accents, but sacking automated systems is still a common theme online. Check out this comedy skit about an elevator equipped with voice recognition, which highlights the frustration that malfunctions in IVR systems can produce.
Personal Phone Apps: Siri, Google Now
Most people are familiar with voice recognition for smartphones. While the majority of the latest phone models come with VR, their popularity – and notoriety – swelled when Apple introduced Siri, the mildly sarcastic, voice-activated "personal assistant" for the iPhone 4S in 2011. Google soon created a direct competitor: Google Now for the Android Jelly Bean OS. Both systems feature female voices and sophisticated recognition features that let users "talk" to their phones using casual language.
But while these systems are considerably more sophisticated and functional than their predecessors, they also show that the technology still has a long way to go. Jokes about Siri’s failure have become a popular Internet meme. One man even sued Apple for false advertising regarding Siri’s capabilities.
Maybe that’s why while Apple created Siri to be advanced and informative, the VR software is also a little on the sassy side. For example, if you speak one of the most infamous intelligence technology lines in cinema history from the 1968 movie "2001: A Space Odyssey" – "open the pod bay doors" – Siri will respond with either the answering line from the movie, "I’m sorry (your name), I’m afraid I can’t do that," or the more sarcastic, "we intelligence agents will never live that down, apparently."
Calling you by name is just one of the functions that tries to make Siri easier to love, and a little more human. The VR assistant can follow voice commands to make calls, take dictation and sends texts, perform Internet searches for information, find nearby stores, give driving directions and more, all without the need to touch anything. Answers are simultaneously spoken by the phone and displayed on the screen.
Google Now, the VR portion of the Android Jelly Bean platform, is very similar to Siri. The system offers the same extensive recognition capabilities by translating casual speech into commands that let users make calls, send texts, run searches, perform calculations and conversions, grab word definitions, set alarms, play songs, and get maps and directions.
With personal voice assistants like Siri and Google Now, the benefits are obvious. Everything from calling and texting to searching and entertainment is faster and easier. Just say what you want, and (most of the time) the VR app grabs it for you. The hands-off technology of VR is especially helpful while driving. And while many people have decried Siri’s flaws, and writers have argued that Google Now’s ability to essentially run users’ lives is both spooky a little insulting, most people still feel these futuristic technologies are pretty cool.
Of course, personal phone apps like Siri and Google Now are far from perfect – although they do show where this technology could be headed in the future. That means that even when Siri turns up a wrong answer, we’re likely to laugh and forgive her, knowing that the next version will be much better.
Where VR Falls Flat
If you’ve ever encountered an IVR when you’ve called a business, you may have noticed certain barriers to communication. Some programs use a robotic text-to-speech voice that mispronounces words and makes things difficult to understand. Others have sensitivity problems that result in the software being unable to process what you’re saying if you’re too loud, too soft, or not enunciating carefully.
In addition, many people still just don’t feel comfortable talking to a machine. If you run a few searches on IVR, you’ll encounter lists people have put together of ways to bypass IVR systems and get to a "real person." This solutions range from "keep pressing 0 for an operator" to "swear at the machine until it fetches a human being." As a result, much of the recent development in IVR systems has revolved around making them more palatable for humans; making the voices more sympathetic and less robotic, making the system easier to navigate, and letting callers know how long the whole thing will take from beginning to end. That suggests the better technology is only half the battle here; the other half is getting users on board with speaking to a machine.
What the Future Holds
Despite these challenges, voice recognition technology is improving all the time. Applications like Siri and Google Now – flaws and all – are still extraordinarily impressive in their performance, and several companies are expanding VR capabilities to other applications.
For example, Nuance, the creators of Dragon NaturallySpeaking speech-to-text software, has already developed voice controls for televisions and automobiles, and versions of this technology are incorporated into some Samsung TVs and the SYNC entertainment systems used in certain Ford vehicles.
And as Google and Apple continue to find new uses for their voice recognition technologies, it’s likely that we’ll increasingly be talking to all kinds of everyday machines, from our televisions to our toasters. And, once again, it looks like science fiction was right. We’ll just have to hope those clever writers were wrong about one thing. If these machines are taking over, you could be in a lot of trouble next time you ask Siri to "open the pod bay doors."