Crafting Training Data: Art and Science vs. Human

This article was originally published on QBox’s blog, prior to Cyara’s acquisition of QBox. Learn more about Cyara + QBox.

Where can you get good chatbot performance? Well, it’s really only possible with your training data.

Discover how you can accelerate your chatbot development and assure performance quality at scale with Cyara’s conversational AI optimization platform.

Training data is the only leverage you have, especially if you’re using popular NLP providers like Watson Assistant, LUIS, and Dialogflow.

If you use Rasa, you can perhaps change model parameters, but ultimately it all comes down to the training data within your model, and the quality of your training data.

And you can’t think of it as an NLP algorithm problem—the algorithms that the NLP providers use are just a black box to us, and it’s a box we’ll never get to open! So, our training data really is the only control we have.

To understand the principles of chatbot performance, the best way is to think of it from an NLP point of view. After all, it’s not a human that will interpret the training data, it’s the NLP engine.

From a science point of view, there is a systematic way to test your model, through k-fold and cross validation testing, and a systematic way of building your model, through intent and entity mapping.

But there is also a bit of a dark art. For example, you may exclusively work in LUIS NLP, and you get to know what works well in your training data and what doesn’t work so well. And sometimes it’s difficult to explain it to someone not familiar with LUIS or Watson.

In other instances, I’ve found working with a certain NLP provider that you have to be a little careful not to overdo the small insignificant words within an utterance, like ‘the,’ ‘and,’ and ‘is,’ as that provider tends to put almost as much weight on those words as the more significant words.

Whereas with some other NLP providers, you don’t have to be quite so careful about such details like the balance of insignificant words.

But overall, there are some basic guidelines to crafting your training data, that apply to whichever NLP provider you use. If you bear these in mind when building your own chatbots, it’ll help you to create a great performing bot.

Using Real Customer Logs

The first guideline is around the use of real customer logs or questions.

Ensure they are not longwinded, too chatty, or contain lots of irrelevant information. Just extract the vital information needed to make each utterance into a brief and clearly expressed piece of training data, which covers just the subject of that intent.

For example, if you’re building a banking chatbot and one of the intents covers requesting new credit card you wouldn’t want to include utterances such as:

I was at my friend’s house when her dog chewed my credit card and it’s no longer working so I need a replacement one.
My purse got stolen whilst I was at the supermarket buying a loaf of bread and some milk, and so I need a new credit card ordered to replace it.

The concept for these utterances is asking for a replacement credit card/ordering a new credit card. If you think of it from an NLP point of view, details about a dog chewing the card or going shopping for bread and milk are not needed, this information is not important.

The important part is the concept, and this is what the NLP engine needs to learn, so some more suitable utterances would be:

I need a replacement credit card.
My credit card was stolen and need a new one ordered.

And then if you add a few more utterance variations to include replacement credit card and ordering a new credit card, this should then help to cover all the different ways people would use to ask about these two concepts—no matter how long winded and not to the point it is!

Avoid Creating Patterns

The second guideline is about avoiding creating patterns within an intent.

For example, here’s an extract of some utterances in an intent about how to contact the bank for our banking model:

Can I have your telephone number?
Can I have your email address?
Can I have your website address?
Can I have your mailing address?

From an NLP point of view, these utterances would mislead the engine to think the “Can I have your” part of each phrase is the most important part of this intent, because it’s been repeated so many times. In this case, the danger is that it could artificially skew that intent over another.

Instead, you can try to make the utterances as varied as possible, such as the following:

Can I have your telephone number? (Note: It’s ok to leave one utterance like this)
I need the bank email address.
Give me the website address.
I’d like your mailing address please.

Entity Placement

The placement of your entities within your utterances needs to be varied so your bot understands the context.

Try to ensure some entities fall at the beginning of the utterance, some in the middle and some towards the end, as in the below examples from our mortgage intent in our banking bot (the entity here is mortgage type, indicated in bold)

Tell me about the application process for a repayment mortgage
Is the application process of your repayment mortgages quick and simple?
Repayment mortgage application process information please.

Spelling Errors

Ensure there are no unintentional typos in your utterances. We’ve seen a lot of client models where there are many spelling errors, and they didn’t even realise.

Check your training data and do a spell check if necessary.

It might be good practice to include a few of the more commonly misspelt words, although some NLP providers have an autocorrect feature which can be activated anyway, so the inclusion of these misspellings wouldn’t be needed.

Ideal utterance amount

The next guideline is a golden question in the chatbot building world: How many utterances do I need?

Well, most NLP providers recommend at least five utterances per intent, but that is the very minimum. Aiming for around 20-40 utterances works well in our experience.

For each concept in the intent, aim for around three varied utterances that cover concept to ensure the learning value to the NLP engine is as strong as it can be.

Use a Thesaurus

Finally, a thesaurus can be an invaluable tool, and it will help to include a variety of synonyms for the key concepts within the intents.

For instance, referring back to our banking bot, we have an intent to cover applying for a loan.

One key concept would be asking to lend some money. Looking up the word lend in a thesaurus comes back with advance, give, and loan, among other examples. And for money you could get back cash, funds, and capital.

All these different synonyms will help you to build a variation of utterances based on the concept of asking to lend some money.

We hope you found this useful but as ever, if you’re looking for help and guidance whether you’ve got an existing bot or are building one from scratch, Cyara can offer both the technology and the support to have it functioning well and delivering more value.

Read more about: AI chatbot testing, Chatbot testing, Chatbots, QBox

About Cyara

Services

News

Partners

Crafting Training Data: Art and Science vs. Human

Using Real Customer Logs

Avoid Creating Patterns

Entity Placement

Spelling Errors

Ideal utterance amount

Use a Thesaurus

Silent AI Failures in CX: When Bots Respond Correctly but Still Frustrate Users

The Importance of LLM-Driven AI Agent Testing for Better CX

The Top 5 Conversational AI Testing Trends Every CX Leader Should Watch

About Cyara

Services

News

Partners

Using Real Customer Logs

Avoid Creating Patterns

Entity Placement

Spelling Errors

Ideal utterance amount

Use a Thesaurus

Silent AI Failures in CX: When Bots Respond Correctly but Still Frustrate Users

The Importance of LLM-Driven AI Agent Testing for Better CX

The Top 5 Conversational AI Testing Trends Every CX Leader Should Watch

Footer