• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Cyara

Cyara

Cyara Customer Experience Assurance Platform

  • Login
  • Contact us
  • Request a demo
  • Login
  • Contact us
  • Request a demo
  • Why Cyara
    • AI-Led CX Assurance Platform
    • AI vision for CX
    • Cyara partner network
    • Cyara Academy
  • Solutions
    • Transform
          • TRANSFORM – Drive CX Change

          • Functional, regression, & objective testing | Cyara Velocity
          • Performance testing | Cyara Cruncher
          • See all use cases >
          • Cyara platform - Transform - Drive CX change
    • Monitor
          • MONITOR – Assure CX Journeys

          • CX monitoring | Cyara Pulse
          • Telecom assurance | Cyara Voice Assure
          • CX & telecom monitoring | Cyara Pulse 360
          • Call ID line assurance | Cyara Number Trust
          • Agent environment assurance | Cyara ResolveAX
          • See all use cases >
          • Cyara platform - Monitor - Assure CX journeys
    • Optimize
          • OPTIMIZE — Leverage AI for CX

          • Conversational AI optimization | Cyara Botium
          • Generative AI assurance | Cyara AI Trust
          • See all use cases >
          • Cyara platform - Optimize - Leverage AI for CX
    • Connect
          • CONNECT — Assure WebRTC CX

          • WebRTC optimization | Cyara testRTC
          • WebRTC monitoring | Cyara watchRTC
          • WebRTC quality assurance | Cyara qualityRTC
          • See all use cases >
          • Cyara platform - Connect - Assure WebRTC CX
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • CX use cases
    • Events & upcoming webinars
    • On-demand webinars
    • Resource library
    • Customer community
  • About Us
        • About Cyara

        • About Cyara
        • Leadership
        • Careers
        • Legal statements, policies, & agreements
        • Services

        • Cyara Academy
        • Consulting services
        • Customer success services
        • Technical support
        • News

        • CEO’s desk
        • Press releases
        • Media coverage
        • Cyara awards
        • Partners

        • Partners

Blog / CX Assurance

March 8, 2021

4 DOs and 3 DON’Ts for Training a Chatbot NLP Model

Florian Treml, Senior Director, Engineering

This article was originally published on Botium’s blog on March 8, 2021, prior to Cyara’s acquisition of Botium. Learn more about Cyara + Botium

A quick summary of 7 important DOs and DON’Ts when training an NLP model for a chatbot. They are best applied before starting a project, but can also help to build a mindset for quality training data in all chatbot project phases.

DO’s and 3 DON’Ts for Chatbot Testing Strategies

DOs and DON’Ts

✅ DO: think in problem space, not in solution space

Users typically think in problem space, not in solution space, and so should you. As a quick example, consider the case of a user who ordered a shirt in an online shop and wants to know when it is expected to arrive. Consider this question:

  • when will my shirt arrive

This is a question from problem space, describing the problem the user wants to be solved, while these are from solution space:

  • what is the estimated shipping time
  • show me the order status

They are describing how your business will react to the problem.

Benefit: the chatbot and the users speak the same language


❌ DON’T: overload your intents with too many problems

As a rule-of-thumb, your intents should handle at most 3–6 user problems as described above. For each problem, you should provide at least 3 user examples. Put your focus on the essence of the intent — the solution your chatbot can provide for your users.

Benefit 1: content stays maintainable and focused

Benefit 2: separation of concerns makes dialog building straight forward


✅ DO: clear separation of intents vs entities

To our surprise, it is still a very common pattern to intermix the concepts of intents and entities, and we strongly suggest to stop doing it. Consider a real-life example of a fashion store that has trained an NLP model with the 3 intents

  • order_tshirt
  • order_pants
  • order_socks

In this case, there is room for exactly 1 intent (order) and 3 entities (shirt, pants, socks). Data scientists training the NLP model maybe won’t notice a real difference, but your developers will be grateful when coding dialog flow and fulfillment based on the NLP model output.

Benefit: maintainable and clearly defined NLP model output


❌ DON’T: repeat sentence patterns in training data

When thinking about the question how much training data is sufficient? you have the resist the general answer the more the better. Having training examples following the same patterns like

  • order me a shirt
  • order me some shirts
  • order me shirts

In the best case, don’t help your NLP model in classification, and in the worst case, will even have a negative effect by overfitting your NLP model (but to be honest, when using a state-of-the-art pre-trained NLP model this is usually prevented out-of-the-box).

Benefit: keeps your training data small and focused


✅ DO: vary sentence structure and key terms

Instead of repeating the same patterns you absolutely should vary the sentence structure for teaching the NLP model different ways for a user to express the problem — here are some good training examples:

  • order me a shirt
  • need a new shirt
  • dress me up with a fancy new shirt

Depending on the domain it may even make sense to use a thesaurus, but — IMPORTANT — only on an entity and key term level: everything else a state-of-the-art NLP model will learn itself. A special thing to consider here is country-specific variations.

Benefit: makes classification robust for variations


❌ DON’T: train the model with misspelled data (but prepare for it)

This is one is obvious — especially for entity resolution some kind of spellchecking not only in training but also on live inference is a must. But also for intent classification, the NLP model will in the worst case learn rubbish.

Benefit: makes classification robust for real user input


✅ DO: edit and use real user input as training data

While you shouldn’t blindly copy&paste real user input to your training data, it is without any doubt the most valuable source of training data and future improvements of your chatbot’s understanding. As long as unsupervised learning for NLP tasks is still in its infancy, having some kind of manual interception and editing process in place is a must to establish continuous learning.

Benefit: improves the quality of your NLP model with each interaction


Action Plan

You can find information about how Botium can help on our Wiki and in our Blog:

  • NLP Quality Metrics
  • Test Set Insights
  • Confusion Matrix and Embeddings
  • Articles about NLU/NLP in our Wiki

Read more about: Chatbot Testing, Chatbots, Cyara Botium, Natural Language Processing (NLP)

Start the Conversation

Tell us what’s on your mind, and learn how Cyara’s AI-led CX transformation can help you delight your customers.

Contact Us

Related Posts

chatbot testing services

June 19, 2025

9 Types of Chatbot Testing to Ensure Consistency, Accuracy, and Engagement

Deliver faster, more efficient, and reliable customer interactions by conducting these 9 types of chatbot testing.

Topics: AI Chatbot Testing, Automated Testing, Chatbot Assurance, Chatbot Testing, Chatbots

chatbot testing solution

May 8, 2025

Chatbot Testing Best Practices to Ensure Flawless Customer Support

As contact centers continue to innovate with AI-powered bots, follow these chatbot testing best practices to optimize your CX for success.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Chatbot Testing, Chatbots, Conversational AI, Customer Experience (CX)

chatbot testing

March 6, 2025

The Future of Chatbot Testing: 5 Trends to Watch

Advancements have ushered in new ways for businesses and customers to connect. Assure CX quality with Cyara's chatbot testing solutions.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbots, Customer Experience (CX)

Footer

  • AI-Led CX Assurance Platform
    • Cyara AI Trust
    • Cyara Botium
    • Cyara CentraCX
    • Cyara Cloud Migration Assurance
    • Cyara Cruncher
    • Cyara Number Trust
    • Cyara probeRTC
    • Cyara Pulse
    • Cyara Pulse 360
    • Cyara qualityRTC
    • Cyara ResolveAX
    • Cyara testingRTC
    • Cyara testRTC
    • Cyara upRTC
    • Cyara Velocity
    • Cyara Voice Assure
    • Cyara watchRTC
  • Use cases
    • Agent desktop testing
    • Cloud contact center monitoring
    • Contact center number test types
    • Contact center testing
    • Continuous testing
    • Conversational AI testing
    • CX monitoring
    • DevOps for CX
    • Email & SMS testing
    • Functional testing
    • Incident management
    • IVR discovery
    • IVR testing
    • Load & performance testing
    • Omnichannel testing
    • Outbound call testing
    • Regression testing
    • Voice biometrics testing
    • Voice of the customer
    • Voice quality testing
    • Web interaction testing
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • Events & upcoming webinars
    • Resource library
    • On-demand webinars
    • Cyara portal & support site access
    • Customer community
  • About us
    • About Cyara
      • About us
      • Leadership
      • Careers
      • Cyara awards
      • Legal statements, policies, & agreements
    • Services
      • Cyara Academy
      • Consulting services
      • Customer success services
      • Technical support
    • News
      • CEO’s desk
      • Press releases
      • Media coverage
    • Partners
      • Partners
      • Integration & technology partners
      • Platform Integrations
  • LinkedIn
  • Twitter
  • YouTube

Copyright © 2006–2025 Cyara® Inc. The Cyara logo, names and marks associated with Cyara’s products and services are trademarks of Cyara. All rights reserved. Privacy Statement  Cookie Settings