• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Cyara

Cyara

Cyara Customer Experience Assurance Platform

  • Login
  • Contact Us
  • Request a demo
  • Login
  • Contact Us
  • Request a Demo
  • Why Cyara
    • AI-Led CX Assurance Platform
    • AI vision for CX
    • Cyara partner network
    • Cyara Academy
  • Solutions
    • Transform
          • TRANSFORM – Drive CX Change

          • Functional, regression, & objective testing | Cyara Velocity
          • Performance testing | Cyara Cruncher
          • See all use cases >
          • Cyara platform - Transform - Drive CX change
    • Monitor
          • MONITOR – Assure CX Journeys

          • CX monitoring | Cyara Pulse
          • Telecom assurance | Cyara Voice Assure
          • CX & telecom monitoring | Cyara Pulse 360
          • Call ID line assurance | Cyara Number Trust
          • Agent environment assurance | Cyara ResolveAX
          • See all use cases >
          • Cyara platform - Monitor - Assure CX journeys
    • Optimize
          • OPTIMIZE — Leverage AI for CX

          • Conversational AI optimization | Cyara Botium
          • Generative AI assurance | Cyara AI Trust
          • See all use cases >
          • Cyara platform - Optimize - Leverage AI for CX
    • Connect
          • CONNECT — Assure WebRTC CX

          • WebRTC optimization | Cyara testRTC
          • WebRTC monitoring | Cyara watchRTC
          • WebRTC quality assurance | Cyara qualityRTC
          • See all use cases >
          • Cyara platform - Connect - Assure WebRTC CX
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • CX use cases
    • Events & upcoming webinars
    • On-demand webinars
    • Resource library
    • Customer community
  • About Us
        • About Cyara

        • About Cyara
        • Leadership
        • Careers
        • Legal statements, policies, & agreements
        • Services

        • Cyara Academy
        • Consulting services
        • Customer success services
        • Technical support
        • News

        • CEO’s desk
        • Press releases
        • Media coverage
        • Cyara awards
        • Partners

        • Partners

Blog / CX Assurance

February 15, 2022

Crafting Training Data: Art and Science vs. Human

Alison Houston

Alison Houston, Data Model Analyst

This article was originally published on QBox’s blog, prior to Cyara’s acquisition of QBox. Learn more about Cyara + QBox.


Where can you get good chatbot performance? Well, it’s really only possible with your training data.

Discover how you can accelerate your chatbot development and assure performance quality at scale with Cyara’s conversational AI optimization platform.

Beakers with colored liquid

Training data is the only leverage you have, especially if you’re using popular NLP providers like Watson Assistant, LUIS, and Dialogflow.

If you use Rasa, you can perhaps change model parameters, but ultimately it all comes down to the training data within your model, and the quality of your training data.

And you can’t think of it as an NLP algorithm problem—the algorithms that the NLP providers use are just a black box to us, and it’s a box we’ll never get to open! So, our training data really is the only control we have.

To understand the principles of chatbot performance, the best way is to think of it from an NLP point of view. After all, it’s not a human that will interpret the training data, it’s the NLP engine.

From a science point of view, there is a systematic way to test your model, through k-fold and cross validation testing, and a systematic way of building your model, through intent and entity mapping.

But there is also a bit of a dark art. For example, you may exclusively work in LUIS NLP, and you get to know what works well in your training data and what doesn’t work so well. And sometimes it’s difficult to explain it to someone not familiar with LUIS or Watson.

In other instances, I’ve found working with a certain NLP provider that you have to be a little careful not to overdo the small insignificant words within an utterance, like ‘the,’ ‘and,’ and ‘is,’ as that provider tends to put almost as much weight on those words as the more significant words.

Whereas with some other NLP providers, you don’t have to be quite so careful about such details like the balance of insignificant words.

But overall, there are some basic guidelines to crafting your training data, that apply to whichever NLP provider you use. If you bear these in mind when building your own chatbots, it’ll help you to create a great performing bot.

Using Real Customer Logs

The first guideline is around the use of real customer logs or questions.

Ensure they are not longwinded, too chatty, or contain lots of irrelevant information. Just extract the vital information needed to make each utterance into a brief and clearly expressed piece of training data, which covers just the subject of that intent.

For example, if you’re building a banking chatbot and one of the intents covers requesting new credit card you wouldn’t want to include utterances such as:

  • I was at my friend’s house when her dog chewed my credit card and it’s no longer working so I need a replacement one.
  • My purse got stolen whilst I was at the supermarket buying a loaf of bread and some milk, and so I need a new credit card ordered to replace it.

The concept for these utterances is asking for a replacement credit card/ordering a new credit card. If you think of it from an NLP point of view, details about a dog chewing the card or going shopping for bread and milk are not needed, this information is not important.

The important part is the concept, and this is what the NLP engine needs to learn, so some more suitable utterances would be:

  • I need a replacement credit card.
  • My credit card was stolen and need a new one ordered.

And then if you add a few more utterance variations to include replacement credit card and ordering a new credit card, this should then help to cover all the different ways people would use to ask about these two concepts—no matter how long winded and not to the point it is!

Avoid Creating Patterns

The second guideline is about avoiding creating patterns within an intent.

For example, here’s an extract of some utterances in an intent about how to contact the bank for our banking model:

  • Can I have your telephone number?
  • Can I have your email address?
  • Can I have your website address?
  • Can I have your mailing address?

From an NLP point of view, these utterances would mislead the engine to think the “Can I have your” part of each phrase is the most important part of this intent, because it’s been repeated so many times. In this case, the danger is that it could artificially skew that intent over another.

Instead, you can try to make the utterances as varied as possible, such as the following:

  • Can I have your telephone number? (Note: It’s ok to leave one utterance like this)
  • I need the bank email address.
  • Give me the website address.
  • I’d like your mailing address please.

Entity Placement

The placement of your entities within your utterances needs to be varied so your bot understands the context.

Try to ensure some entities fall at the beginning of the utterance, some in the middle and some towards the end, as in the below examples from our mortgage intent in our banking bot (the entity here is mortgage type, indicated in bold)

  • Tell me about the application process for a repayment mortgage
  • Is the application process of your repayment mortgages quick and simple?
  • Repayment mortgage application process information please.

Spelling Errors

Ensure there are no unintentional typos in your utterances. We’ve seen a lot of client models where there are many spelling errors, and they didn’t even realise.

Check your training data and do a spell check if necessary.

It might be good practice to include a few of the more commonly misspelt words, although some NLP providers have an autocorrect feature which can be activated anyway, so the inclusion of these misspellings wouldn’t be needed.

Ideal utterance amount

The next guideline is a golden question in the chatbot building world: How many utterances do I need?

Well, most NLP providers recommend at least five utterances per intent, but that is the very minimum. Aiming for around 20-40 utterances works well in our experience.

For each concept in the intent, aim for around three varied utterances that cover concept to ensure the learning value to the NLP engine is as strong as it can be.

Use a Thesaurus

Finally, a thesaurus can be an invaluable tool, and it will help to include a variety of synonyms for the key concepts within the intents.

For instance, referring back to our banking bot, we have an intent to cover applying for a loan.

One key concept would be asking to lend some money. Looking up the word lend in a thesaurus comes back with advance, give, and loan, among other examples. And for money you could get back cash, funds, and capital.

All these different synonyms will help you to build a variation of utterances based on the concept of asking to lend some money.

We hope you found this useful but as ever, if you’re looking for help and guidance whether you’ve got an existing bot or are building one from scratch, Cyara can offer both the technology and the support to have it functioning well and delivering more value.

Read more about: AI Chatbot Testing, Chatbot Testing, Chatbots, QBox

Start the Conversation

Tell us what’s on your mind, and learn how Cyara’s AI-led CX transformation can help you delight your customers.

Contact Us

Related Posts

conversational AI testing

August 28, 2025

Automated Testing for Conversational AI: A Game-Changer in Customer Support

The rise of AI-powered CX offer many key benefits... and risks. Learn how to ensure CX quality with a conversational AI testing solution.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbots, Conversational AI, Conversational AI Testing, Customer Experience (CX)

chatbot testing

July 31, 2025

How Chatbot Testing Helps You Accelerate CX Innovation and Deliver Exceptional Interactions

Poorly optimized chatbots leave you vulnerable to a wide range of risks. Start accelerating CX innovation confidently with chatbot testing.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbot Assurance, Chatbot Testing, Contact Centers, Customer Experience (CX)

chatbot testing services

June 19, 2025

9 Types of Chatbot Testing to Ensure Consistency, Accuracy, and Engagement

Deliver faster, more efficient, and reliable customer interactions by conducting these 9 types of chatbot testing.

Topics: AI Chatbot Testing, Automated Testing, Chatbot Assurance, Chatbot Testing, Chatbots

Footer

  • AI-Led CX Assurance Platform
    • Cyara AI Trust
    • Cyara Botium
    • Cyara CentraCX
    • Cyara Cloud Migration Assurance
    • Cyara Cruncher
    • Cyara Number Trust
    • Cyara probeRTC
    • Cyara Pulse
    • Cyara Pulse 360
    • Cyara qualityRTC
    • Cyara ResolveAX
    • Cyara testingRTC
    • Cyara testRTC
    • Cyara upRTC
    • Cyara Velocity
    • Cyara Voice Assure
    • Cyara watchRTC
  • Use cases
    • Agent desktop testing
    • Cloud contact center monitoring
    • Contact center number test types
    • Contact center testing
    • Continuous testing
    • Conversational AI testing
    • CX monitoring
    • DevOps for CX
    • Email & SMS testing
    • Functional testing
    • Incident management
    • IVR discovery
    • IVR testing
    • Load & performance testing
    • Omnichannel testing
    • Outbound call testing
    • Regression testing
    • Voice biometrics testing
    • Voice of the customer
    • Voice quality testing
    • Web interaction testing
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • Events & upcoming webinars
    • Resource library
    • On-demand webinars
    • Cyara portal & support site access
    • Customer community
  • About us
    • About Cyara
      • About us
      • Leadership
      • Careers
      • Cyara awards
      • Legal statements, policies, & agreements
    • Services
      • Cyara Academy
      • Consulting services
      • Customer success services
      • Technical support
    • News
      • CEO’s desk
      • Press releases
      • Media coverage
    • Partners
      • Partners
      • Integration & technology partners
      • Platform Integrations
  • LinkedIn
  • Twitter
  • YouTube

Copyright © 2006–2025 Cyara® Inc. The Cyara logo, names and marks associated with Cyara’s products and services are trademarks of Cyara. All rights reserved. Privacy Statement