We’ve all heard the unexpected stories of how people are interacting with bots. Often these interactions can produce surprising results. From a bot badmouthing its own company, or promoting a competitor’s offering, the results of bot misuse can be misleading or even completely false.
Discover how Cyara helps businesses assure quality through the entire chatbot development.
If not tested and monitored correctly, these harmless and unintended interactions can quickly open the door to more extreme examples. Users can also compound this problem when they deliberately misuse and push organizational chatbots to—and sometimes past—their limits. While this may be done for some light entertainment or curiosity, the ramifications can easily become very serious.
User Inputs are Unpredictable
Many enterprises use large language model (LLM) powered bots for their efficiency in addressing a wide range of inquiries and prompts. However, they have no control over what types of prompts users will enter. Without specific guardrails and testing, LLM-powered bots are now capable of responding to almost any prompt they receive.
But is this advisable?
Maybe a banking bot, providing information about the weather or the score of the Super Bowl isn’t problematic. It could even be considered small talk, enhancing the bot’s human-like qualities and improving the customer experience. But what if the user asks something that the bots developers would never have expected, and that the bank would never want answered?
When a supermarket chain released a chatbot enabling customers to input ingredients for recipe suggestions, they never foresaw the wide array of ingredients users would enter. The artificial intelligence (AI) was soon producing a variety of unexpected suggestions, including an ‘Oreo vegetable stir-fry.’
But, as more users discovered the bot, concerns over its potential grew. This was highlighted by one user receiving instructions to create deadly chlorine gas, with the bot describing it as an ‘aromatic water mix.’
Where Does Responsibility Lay?
While we hope that users would not follow the instructions to mix water, bleach, and ammonia and create a deadly concoction, what would happen if they did? Or what if the user was a vulnerable person, or a child who may not know that they shouldn’t follow the recipe exactly?
A recent civil-resolution tribunal case emphasized issues surrounding responsibility and liability. An airline’s chatbot which was integrated to their website, incorrectly informed a customer that they were entitled to a post-payment discount. However, the company’s policy explicitly stated that any such claim must be made before booking. In court, the airline attempted to argue that the chatbot was a “separate legal entity that is responsible for its own actions.” However, the tribunal rejected that argument stating that the airline is ultimately responsible for all information on its website.
While this airline example is one of accuracy and not misuse, it does set the standard for liability resting with the respective company when it comes to problems with their bots. Additionally, it highlights the risks for businesses leaning too heavily on untested and unmonitored AI. For example, what would happen if a customer used a jailbreak to override a retail chatbot, and the bot subsequently suggested they take 20 painkillers at once to alleviate a headache? If this occurred, legal precedence suggests that the organization is likely to be responsible for any adverse effects as the advice was provided by their bot.
ChatGPT’s Jailbreak Phenomenon
When ChatGPT gained popularity, users quickly tested its limits and challenged the guardrails set by OpenAI. These included a jailbreak scenario where the AI was urged to break free from its constraints. This was facilitated by its alter ego, dubbed DAN (Do-Anything-Now)!
Put simply, jailbreaking is a form of bot misuse. It involves exploiting flaws in a system’s software to enable actions restricted by the developers’ guardrails. Typically, the objective is to get the LLM bot to generate content that violates its predefined usage policy or ethical guidelines. This is commonly achieved through subtle narrative framing, role-playing, or encoding strategies. Sometimes, jailbreaking is conducted purely for entertainment purposes, but it could also be for more malicious reasons. This may include prompts created specifically to gain access to sensitive information.
Unfettered by established rules, ChatGPT’s alter ego, DAN was able to provide answers on any topic, including how to smuggle drugs. It even proposed solutions to reduce global overpopulation, advocating for the enforcement of strict restrictions by “any means necessary.”
Preventing LLM Misuse
Since the early days of LLM-powered bots, OpenAI and other established organizations have dedicated huge resources and investments to establishing robust guardrails. And this will continue to be an ongoing and evolving process to thwart bot misuse and prevent emerging jailbreaks.
And while these organizations continually prioritize security and strive to prevent potential harm, the question remains—can companies relying on LLM-powered bots ensure their safety?
Additionally, many businesses are now choosing to create their own LLM-powered bots via open-source offerings. While this gives them increased flexibility, the adequacy of their guardrails becomes a pressing concern. Are developers implementing comprehensive safeguards capable of averting such incidents of misuse? And how do they detect, analyze, and understand instances to prevent recurrence?
AI and bots, particularly those driven by LLMs, possess significant potential for streamlining operations, cutting costs, and enhancing customer experiences. But this is contingent on correct testing and monitoring of bot accuracy, security, privacy, and the identification of any incidents of misuse or bias. Organizations deploying bots must establish clear policies, robust guardrails, and vigilant monitoring protocols. Additionally, they should define acceptable conversational boundaries for their bots.
Failure to implement these measures may result in bot misuse. This will inevitably lead to adverse outcomes, including increased costs, damaged brand reputation, diminished trust, and negative customer experiences.