Cloud migration, omni-channel usage, greater digitization, increased adoption of voice bots and voice biometrics, along with rapid developments in artificial intelligence (AI) are just a few of the ever growing advancements which are reshaping how organizations like yours are interacting with customers.
The combination of these factors, along with the push for greater automation and self-service, has reduced the reliance on human agents for call handling, resulting in a more seamless and efficient customer experience. However, it has also further highlighted the importance of reliable connections and high-quality audio.
Critically, a breakdown or even a reduced level of audio quality in communication services can have detrimental consequences, impacting not only customer experience, but also your brand reputation and bottom line. All businesses rely on multiple forms of communication to connect with customers, partners and employees – including voice calls. However, if the underlying network is unreliable, it compromises the quality of your interactions.
In this article, we’ll explore how poor audio quality can significantly impact your speech enabled interactive voice response (IVR) and speech bots.
Speech recognition accuracy
At Cyara, we recently conducted a study using the 3 most widely used automatic speech recognition (ASR) engines. We utilized varying degrees of audio quality to analyze the impact that poor speech quality had on the accuracy of the ASR engine.
To ensure the validity of the study, we established several parameters, including the following:
- Audio samples were extracted from calls conducted over standard telephone and mobile networks into a contact center.
- We employed the industry-standard Perceptual Evaluation of Speech Quality (PESQ) scoring algorithm, which assesses audio characteristics such as audio sharpness, background noise, audio clipping, and interference, to categorize audio quality bands.
- Our analysis involved a total of 12,000 recordings – with 2,000 recordings in each of the quality bands.
- Speech accuracy was measured using the industry-standard Word Error Rate (WER) method.
The accuracy of the speech recognition for each band is presented in the below table.
Speech quality | Speech recognition accuracy |
Excellent | 87% |
Very good | 83% |
Good | 79% |
Fair | 76% |
Poor | 71% |
Very poor | 59% |
How does this impact you?
While calculating speech recognition using WER may not provide an exact correlation to the impact on your customer calls, the data presented above makes it evident that the likelihood of your customer having a poor calling experience is significantly increased.
For instance, consider a scenario in which a customer calls in and asks, “Can I check my balance, please?” If the ASR engine only recognizes the words, “Check” and “Balance”, there is a good chance that the call will still be handled correctly. Conversely, if it accurately recognizes every word except, “Balance”, it is unlikely to provide a correct response.
Real world audio quality performance
While the benchmark performance for audio quality in many countries falls within the ‘excellent’ range, unfortunately, some countries like Brazil, Italy, and Mexico do not meet this standard and are categorized as having ‘good’ or even ‘fair’ audio quality.
Although nearly all countries in North America and Europe meet the ‘excellent’ audio quality benchmark, there are individual carriers within these regions (that organizations like yours may be using) who are failing to meet this standard.
Why is this important?
Today, companies allocate considerable time and resources to automation and self-service initiatives. When properly developed and rigorously tested, these efforts can significantly enhance both customer experience and operational efficiency, leading to cost savings and revenue growth.
However, it’s essential to recognize that even the most advanced and well-developed speech-enabled IVRs or speech bots will not be able to effectively comprehend requests or provide the accurate responses if the quality of the speech they receive is significantly impaired.
Therefore, before deploying such systems, it becomes imperative that you gain a comprehensive understanding of and benchmark the speech quality in the market and within your own organization. Subsequently, identifying and addressing any issues in your systems through rigorous testing and ongoing monitoring will be crucial to ensure they are operating effectively and ultimately delivering the best possible experience for your customers and your organization’s bottom line.