How do chatbots deal with accents?
In the past, verbal input that was spoken with an accent, lisp or stutter often resulted in the same output: “I’m sorry, I don’t understand.” That’s because yesteryear's automatic speech recognition (ASR) was rules-based and trained with a limited number of phoneme patterns.
In linguistics, a phoneme is the smallest unit of sound in a particular language. Yesteryear's ASR systems used phonemes to break spoken language down into individual sounds and match phoneme patterns to specific words. When someone spoke with an accent or speech impediment, it was often difficult for the system to statistically match the user’s verbal input with specific words or phrases the system had been taught.
Today, there are several ways developers are successfully addressing this known issue. Popular approaches include:
- Providing users with an option to choose an accent at the beginning of an interaction. For example, the user might be given a choice between American, British and Australian English.
- Training the speech recognition technology with large, labeled data sets that include examples of different accents for a particular language.
- Using unsupervised or semi-supervised machine learning (ML) models trained on diverse multi-lingual datasets to optimize language-independent outputs.
Why Have Voice-Based Chatbots and IVR Systems Gotten Better at Handling Accents?
In order to understand why there's been such improvement in such a relatively short period of time, it’s important to understand how the technology works.
When a virtual assistant or IVR system receives spoken input, the first thing it does is use speech-to-text algorithms to convert the spoken words into text.
Next, the system uses natural language processing (NLP) to analyze the structure of the text and determine each word’s meaning. Once the analysis is completed, it uses intent recognition to determine which action to take.
Intent recognition can be based on either statistics or rules. Rules-based models are programmed with if/then statements that map specific inputs to specific outputs. Many old school IVRs and chatbots were rules-based. If you asked the bot something that wasn’t mapped out in its dialogue flowchart, you probably got a response like “I’m sorry. I don’t understand.”
Increasingly, data scientists and machine learning engineers are using large language models (LLMs) such as OpenAI’s GPT-3 and Google’s BERT to classify intent and optimize outputs for voice-based chatbots and IVRs. This type of complex AI model is often trained for general use with unsupervised learning and then fine-tuned with labeled data for specific languages, dialects and accents.
The result is that today’s conversational AI systems are far more likely to accurately understand intent even when someone speaks with an accent or has a speech impediment.
More Q&As from our experts
- What is the difference between speech to text and chatbots?
- Why should we care about Natural Language Disambiguation?
- How are AI and machine learning changing risk management?
- Natural Language Understanding
- Conversational Search
- Generative AI
- Speech-to-Text Software
- Self-Supervised Learning
- Cognitive Computing
- Turing Test
- Speech Analytics
Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
- The CIO Guide to Information Security
- Robotic Process Automation: What You Need to Know
- Data Governance Is Everyone's Business
- Key Applications for AI in the Supply Chain
- Service Mesh for Mere Mortals - Free 100+ page eBook
- Do You Need a Head of Remote?
- Web Data Collection in 2022 - Everything you need to know