• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Cyara

Cyara

Cyara Customer Experience Assurance Platform

  • LOGIN
  • CONTACT US
  • WATCH A DEMO
  • PRODUCTS & SERVICES
    • AI-Powered CX Assurance Platform
      • Call Explorer
      • Call Routing & Agent Desktop Testing
      • Cloud Contact Center Monitoring
      • Conversational AI Testing
      • Integrations
      • Omnichannel Testing
      • Voice Quality Testing
    • Products
      • AI Trust
      • Botium
      • CentraCX
      • Cloud Migration Assurance
      • Cruncher
      • Number Trust
      • Pulse
      • Pulse 360
      • ResolveAX
      • testRTC
      • Velocity
      • Voice Assure
    • Services
      • Cyara Academy
      • Consulting
      • Customer Success
      • Support
  • SOLUTIONS
    • IVR Testing
      • IVR Discovery
      • IVR Testing
    • Omnichannel Testing
      • Chatbot Testing & Optimization
      • Cloud Contact Center
      • Contact Center Number Test Types
      • Contact Center Testing
      • Email & SMS Testing
      • Omnichannel Testing
      • Voice of Customer
      • Web Interaction Testing
    • Software Testing & Monitoring
      • Continuous Testing Solutions
      • Customer Experience Monitoring
      • DevOps for Customer Experience
      • Functional Testing
      • Incident Management
      • Load/Performance Testing
      • Regression Testing
    • Voice Quality Testing
      • Agent Desktop Testing
      • Outbound Call Testing
      • Voice Biometrics Testing
      • Voice Quality Testing
  • RESOURCES
    • Blog
    • Events
    • Customer Success Showcase
    • Resources
    • Webinars
  • ABOUT
    • CEO’s Desk
    • Leadership
    • Press Releases
    • Media Coverage
    • Partners
    • Awards
    • About Cyara
    • Careers
    • Employee Profiles
    • Legal

Blog / CX Assurance

May 11, 2023

Cross-Validation Testing Top Tips

Alison Houston

Alison Houston, Data Model Analyst

This article was originally published on QBox’s blog, prior to Cyara’s acquisition of QBox. Learn more about Cyara + QBox.


Once your chatbot has been built and trained, if you have used advanced tooling, you haven’t had the need to test your chatbot model with data outside of its training set. But there will come a time when you will , in effect, need to simulate real-world interactions to provide a more accurate measure of how well your chatbot might perform once it’s live. This is called cross-validation testing. 

Cyara helps businesses assure chatbot quality through the entire development lifecycle.

Woman and bot performing testing

This test data could consist of:

  • A set of real user utterances that would have been set aside before the chatbot was built;
  • A set of in-house utterances devised before or during the chatbot build;
  • A collection of real user utterances once the chatbot has been launched. 

Incidentally, for those of you who have devised their cross-validation dataset in-house, a word of warning:  to ensure no model bias is present in the cross-validation data, it is recommended this dataset is not created by anyone directly associated with the chatbot build. A top tip would be to get other colleagues involved (or family and friends!)—simply give them a brief explanation of each intent (but not too much detail) and ask them to list as many various ways on how they would ask each one.  

This cross-validation data is then tested against your chatbot to evaluate its performance. It will help to identify any blind spots in your training data—perhaps new concepts (key words or phrases) that have been missed, or new ways to express the existing concepts within the intents. It can also identify if your chatbot is overfitting, meaning the model is so finetuned to its existing training data that it negatively impacts the performance of the model on new data.

Whichever way the cross-validation data is created, it’s vital that the data covers every intent in your chatbot model, to ensure all intents are thoroughly tested.

But How Much Data is Needed?  

We would recommend aiming for a minimum of 1x times the amount of training data you have in each intent.  For example, if you have an intent with 30 utterances, you should have at least 30 cross-validation utterances for that intent.  For your short-tail intents (the intents you anticipate being returned the most frequently), or the more complex intents, try to increase the number of cross validation utterances to 2 or even 3 times the amount of training data, or even more—the more the better!  But this probably won’t be an overnight process, the dataset should be expanded over time—collected in conjunction with audits and reports from your live user logs. When collecting utterances from your live user logs, always try to pick a selection that feature very diverse language, while still being valid in their subject matter, to ensure your chatbot is tested to its limits.

In addition to evaluating chatbot performance, cross-validation testing has other uses too.  A key one is to identify regressions when you make major changes to your chatbot. For example, you might want to scale up the chatbot at some point. Once you’ve added lots of new intents, you’ll need to make sure cross-validation utterances that were returning the correct intents before are still performing just as well after the updates. So, it’s recommended you test your model with the same cross-validation data before and after making such updates. In fact, you should get into the habit of regular cross-validation testing, even if you’re just making minor tweaks in your model to improve performance. This will help to give you peace of mind that any changes you’re making won’t be detrimental to the rest of the model.

Another key use of cross-validation testing is to help determine a suitable confidence threshold for your chatbot. This would involve producing an ROC or AUC by plotting all the results of the cross-validation test onto a graph using various confidence thresholds. You can then determine the optimum confidence threshold for your particular needs. For example, if you want a very accurate chatbot you’ll probably want to increase the confidence threshold to minimize the risk of giving incorrect answers to your customers. And from the ROC curve you’ll be able to understand the trade-off of having that higher threshold. This is a very short explanation of the ROC curve, and you can read more here. 

In summary, cross-validation testing is a very useful way for assessing the effectiveness of your chatbot model, but it is essential to have a good quality dataset that tests each intent and with as many diverse utterances as you can possibly gather. 

Read more about: AI Chatbot Testing, Automated Testing, Chatbot Testing, Chatbots, QBox

Start the Conversation

Tell us what’s on your mind, and learn how Cyara’s AI-led CX transformation can help you delight your customers.

Contact Us

Related Posts

contact center monitoring

May 15, 2025

Omnichannel vs. Multichannel CX: Provide Consistency with Contact Center Monitoring

Discover the difference between omnichannel and multichannel strategies, and how to assure CX quality with contact center monitoring.

Topics: Automated Testing, Contact Center Testing, Customer Experience (CX), Customer Experience (CX) Monitoring, Omnichannel, Performance Testing

chatbot testing solution

May 8, 2025

Chatbot Testing Best Practices to Ensure Flawless Customer Support

As contact centers continue to innovate with AI-powered bots, follow these chatbot testing best practices to optimize your CX for success.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Chatbot Testing, Chatbots, Conversational AI, Customer Experience (CX)

IVR testing solution

May 1, 2025

How AI-Powered IVRs Are Transforming Customer Interactions

Learn how the rise of AI-powered IVRs are transforming customer interactions, and how to leverage IVR testing solutions for better CX.

Topics: Automated Testing, Customer Experience (CX), Interactive Voice Response (IVR), IVR testing

Footer

Cyara logo
 
  • LinkedIn
  • Twitter
  • YouTube

Copyright © 2006–2025 Cyara® Inc. The Cyara logo, names and marks associated with Cyara’s products and services are trademarks of Cyara. All rights reserved. Privacy Statement  Cookie Settings