Today is the future! Or it certainly seems that way, with all the buzz and advancements in machine learning, neural networks, natural language programming, and other disciplines within artificial intelligence. We humans are now no longer the only “beings” on planet earth to use and understand spoken language.
We don’t have to look very hard to find examples ranging from funny to outright disturbing when applications built on Large Language Models (LLMs), like ChatGPT and Google’s Bard, make embarrassing mistakes. A lot of that has to do with the fact that there’s a big difference between coming up with an appropriate-sounding word that correctly and grammatically makes sense to follow the previous word, but doesn’t actually account for context, understanding, or empathy – key elements of meaningful communication.
As a leader in the Chatbot testing and assurance space, Cyara is highly driven by new advancements in the world of conversational applications and our primary focus is to help our customers train, test, and monitor their chatbots and conversational AI. In this article, we will share our perspective on what LLMs actually are, what real and potential effects they could have on Customer Experiences, the technical challenges they present, and where we observe there’s room for improvement.
What Are LLMs?
While the sophistication of chatbot technology has been increasing over the last decade, the emergence of large, pre-trained language models, such as GPT3, Google’s LaMDA or Bloom, are causing never-before-seen hype in the Conversational AI industry.
ChatGPT and Bard are conversational applications of large language models (LLMs), trained on a vast body of text – usually terabytes of data from various sources – so they can generate human-like conversations. These models use deep learning algorithms that recognize, summarize, translate, predict and generate text and other content, such as written code, math equations, song lyrics, or even suggest great first date ideas.
Large Language Models in CX
The buzz around LLMs has been a sight to behold. Nowhere has the talk been as animated as in the world of customer service, inspiring renewed enthusiasm for applying AI throughout the customer journey. When they’re working, customer self-service applications that utilize conversational AI based on LLMs can alleviate common frustrations and offer a swifter, more optimized path for customers to get what they need, helping organizations keep up with soaring customer expectations.
While speech recognition has a long history in customer management and call center automation, the new LLM-driven chatbots make significant advancements towards providing a more human-like, personalized service.
Organizations see opportunities to leverage LLMs to gain a strategic advantage. There is real potential to drive unprecedented value by curating superior user experiences to win more business and automating mundane tasks to keep costs low. But success is not guaranteed.
The Biggest Challenges of Adopting LLMs
Google’s experimental conversational AI, Bard, has shown us that while versatile, LLMs still have some kinks to be worked out.
A major problem for LLMs is their tendency to confidently and convincingly state incorrect information as fact. LLMs frequently make up information, since they have no hard-coded database of facts – just the ability to write plausible-sounding statements without any guarantee of being correct.
While traditional Natural Language Processing (NLP) models are trained on manually labeled data, LLMs use unsupervised sources to learn from and generate responses. Consequently, there’s a high potential to produce nonsense, like responses that are either completely irrelevant, biased or inaccurate.
Another limitation is that LLM-based bots are usually very resource-intensive. The volume of training data behind an LLM-based bot could considerably slow them down, therefore it might be challenging to use it in some real-time applications like customer service where we expect quick responses.
After Bard’s recent struggle, the market immediately reacted by wiping 7%, or $144 billion, off Alphabet’s evaluation, therefore showing some lost trust in Google’s AI technology.
What Is the Missing Piece in the Success of LLMs?
While such complex and unpredictable language models may not be ready for unchecked interactions directly with customers, Google’s spokesperson highlights the missing element in the success of LLMs: “rigorous testing.”
Like with any type of testing, the purpose of testing LLMs is to evaluate their performance, capabilities, limitations and potential risks in order to prevent them from spreading misinformation and ensure they are secure and reliable.
Sadly, testing LLMs is comparable to testing the global information universe or a search engine, which poses a huge challenge. Instead of human evaluation, which is time-consuming, subjective, inconsistent and ineffective, we should leave this task to automation, a testing strategy that’s actually scalable and provides quality assurance along the way for these enormous models.
Of course, testing here does not solve all problems.
How to Leverage LLMs Successfully & Avoid Current Pitfalls
LLMs have demonstrated impressive capabilities in generating human-like text that is difficult to distinguish from text written by actual humans, but the inherent dangers of using them to augment customer service still remain.
Cyara’s goal in this article is to outline possible strategies that could increase the organizational confidence in LLMs by creating a structure with more predictable outcomes.
Below are three ways we think anyone can leverage the advanced technology and realize benefits that using LLMs allow without compromising the quality of your conversational AI.
- Decrease the scope: One approach companies can take is creating a smaller, generative AI model based on their own supervised, internal data. This would allow the company to implement certain guardrails for their LLMs. This technique involves things like fine-tuning, (feeding the model a small amount of data tailored to the task at hand), which will laser-focus the model for a domain-specific use case, thereby reducing its scope and increasing its accuracy. Such customizations can improve performance on a specific task. Basically, this is like asking a bot to declare a major in college and become a subject matter expert!
- Go hybrid: Another possible approach to tap into LLM potential without experiencing drawbacks or risks is to limit operation to only certain use cases. This might look like using a Conversational AI to power ‘business conversations’ while leaning on LLMs for small talk or other ‘more human’ interactions.
- Supercharge your training and test data: The power of LLMs is not restricted to client-facing conversational applications, but can also help to accelerate the development of training and testing data. Botium, – Cyara’s quality assurance platform for Conversational AI – enables users to rapidly develop training and test data through an integration with GPT3, developed by OpenAI. GPT3 is an AI system that produces natural language and only requires a small amount of input text to generate large volumes of relevant and sophisticated user examples for your organization’s chatbot.
This means putting NLP-based chatbots in charge of precision-intensive questions, and letting LLMs handle easier, more general conversations. This combines more technologies to balance customer needs, such as personalization and factuality. A hybrid solution like this would also open the door for automated testing, which would help overcome the primary QA hurdles before confidently declaring “Success!”
Running a digital, self-service contact center is already a challenging task without the additional complications that LLMs and Conversational AIs pose. Finding a balance between the benefits and possible risks of using LLMs is a dilemma that will take time to solve, but you don’t need to rely solely on you and your team to navigate through these difficulties.
With a testing and monitoring solution like Cyara Botium, you can automate quality assurance and expand your testing capacity far beyond what you can do manually. This ensures you can continuously test and monitor your whole system and feel confident that you’re delivering flawless CX.
Let Cyara Help You Do More with Less
In nearly every crucial area of your contact center, automated testing and monitoring will help you do more with less. It’s the key to embracing today’s digital service solutions without drowning in the costs of maintenance and upkeep. For many contact centers, it’s the missing piece that will help them achieve true CX assurance.
Cyara’s award-winning CX assurance platform has the tools you need for every digital CX solution. Whether you’re embarking on cloud migration, updating your IVR, or deploying new chatbots, we can help you ensure each one is dialed in to help you deliver excellent customer service at any scale. And we do it all while saving you money and freeing your staff for more important work.
Ready to get started? Reach out today to see how Cyara helps you do more with less.