• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Cyara

Cyara

Cyara Customer Experience Assurance Platform

  • Login
  • Contact Us
  • Request a demo
  • Login
  • Contact Us
  • Request a Demo
  • Why Cyara
    • AI-Led CX Assurance Platform
    • AI vision for CX
    • Cyara partner network
    • Cyara Academy
  • Solutions
    • Transform
          • TRANSFORM – Drive CX Change

          • Functional, regression, & objective testing | Cyara Velocity
          • Performance testing | Cyara Cruncher
          • See all use cases >
          • Cyara platform - Transform - Drive CX change
    • Monitor
          • MONITOR – Assure CX Journeys

          • CX monitoring | Cyara Pulse
          • Telecom assurance | Cyara Voice Assure
          • CX & telecom monitoring | Cyara Pulse 360
          • Call ID line assurance | Cyara Number Trust
          • Agent environment assurance | Cyara ResolveAX
          • See all use cases >
          • Cyara platform - Monitor - Assure CX journeys
    • Optimize
          • OPTIMIZE — Leverage AI for CX

          • Conversational AI optimization | Cyara Botium
          • Generative AI assurance | Cyara AI Trust
          • See all use cases >
          • Cyara platform - Optimize - Leverage AI for CX
    • Connect
          • CONNECT — Assure WebRTC CX

          • WebRTC optimization | Cyara testRTC
          • WebRTC monitoring | Cyara watchRTC
          • WebRTC quality assurance | Cyara qualityRTC
          • See all use cases >
          • Cyara platform - Connect - Assure WebRTC CX
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • CX use cases
    • Events & upcoming webinars
    • On-demand webinars
    • Resource library
    • Customer community
  • About Us
        • About Cyara

        • About Cyara
        • Leadership
        • Careers
        • Legal statements, policies, & agreements
        • Services

        • Cyara Academy
        • Consulting services
        • Customer success services
        • Technical support
        • News

        • CEO’s desk
        • Press releases
        • Media coverage
        • Cyara awards
        • Partners

        • Partners

Blog / CX Assurance

June 10, 2020

Tutorial: Analyze and Improve IBM Watson Assistant Skill Performance

Florian Treml, Senior Director, Engineering

This article was originally published on Botium’s blog on June 10, 2020, prior to Cyara’s acquisition of Botium. Learn more about Cyara + Botium

Audience: If you have an existing Watson Assistant skill and you want to analyze it for consistency as well as the performance of the training data itself, you should read this article. It shows ways to improve the performance of your skill as well. We will use Botium to:

  • download the training data from your Watson Assistant skill to Botium
  • run static and dynamic analytics on it
  • present the results in the Botium Coach dashboard
  • augment the training data to improve the performance
  • upload the training data from Botium to your Watson Assistant skill
  • and finally, validate the improvements

Attention: In machine learning, it typically makes no sense to use training data for testing, but using the outlined approach you will detect any serious flaws in the training data itself. It is not possible to tell how your skill will work out in production later, for this you have to invest some more effort to prepare good test data (or use the data sets included in Botium).

Step 1: Open a Channel to your Watson Assistant Skill

Open the Chatbots menu in Botium, click the Register New Chatbot button and select IBM Watson Assistant API in the technology selection field. You can find everything you need in your IBM Cloud Console – in most cases you will authenticate with an IAM API Key – see the IBM docs on how to get yours. It is important to select Assistant V1 as SDK Version.

Watson - Register Chatbot
Register Chatbot in Botium Box

When everything is in place, click the Say Hello button to verify your credentials.

Car Dashboard Training Data - Say Hello

Step 2: Use the Test Case Wizard to Download Training Data to Botium Box

Open the Test Case Wizard, expand the Conversation Model Downloader section, enter a name for the new Test Set and click on Download from IBM Watson Assistant. Make sure the Create new Test project with this Test Set is enabled to save you a few extra clicks later.

Car Dashboard Training Data - Test Case Wizard - Conversation Model Uploader

At this point, Botium will download the intents and user examples from your skill and build a Test Set in Botium out of it. It will do a static analysis of the training data as well – in case it identifies obvious problems (such as duplicate utterances, empty user examples list…) it will immediately warn you about it. You can now see the Test Set statistics.

Car Dashboard Training Data - Test Set Dashboard

By switching to the Test Cases tab, you should now see something familiar – the intent list as you named it in your Watson Assistant skill, as well as the user examples for them!

Car Dashboard Training Data - Test Cases

Step 3: Run a First Test Session

In the Test Set Dashboard, click the Start Test Session button and watch the test session progress. Botium will send all of your training data one after the other to the Watson Assistant skill and notice any irregularities – for example, if a user example resolves to another intent as expected.

Attention: Depending on the Watson Assistant plan in place you will have to pay for each API call!

Car Dashboard Training Data - Test Session

Step 4: Open the results in Botium Coach Dashboard

In Botium Coach Dashboard you will now receive some hints about what might be wrong with your training data. You won’t get much out of the confidence score evaluations, as we are testing with training data, so anything but a very high average confidence score would be really surprising (or alarming).

Car Dashboard Training Data - Coach Dashboard

You can have a look a the confusion matrix, but for the same reason as above, you won’t really get valuable hints there.

Botium Coach - Confusion Matrix

You definitely should have a look into the Mismatch Probability Risks section – this is about user examples with a high risk of matching an incorrect intent, which usually would trigger disambiguation in Watson Assistant (if this feature is enabled) – this shouldn’t happen for training data. Botium Coach evaluates the alternate intents lists to show you any intents and user examples that are very likely to cause a mismatch due to similar confidence scores. In the example below you can see that the user examples “good day”, “can I teach you” and some more have a mismatch probability score of 1, which means that Watson Assistant is not able to distinguish between two intents.

Car Dashboard Training Data - Mismatch Probabilty Risks

Clicking inside the chart will show the alternate intents list Watson Assistant returned for this user example – so there is definitely something odd with your training data – most likely, the user example “good day” is included in both of the intended user examples.

Predicted intents of utterence 'good day'

Other valuable insights you can test in the Intent Mismatch Probability Risks, Alternative Intents list – this will highlight any intents that are often appearing in the alternate intents lists of the other on a rather prominent position.

Top 100 Intent Mismatch Probability Risks, Alternative Intents

Step 5: Augment Training Data

You can now choose to:

  • augment your training data in your Watson Assistant workspace and then download the augmented training data to Botium Box again
  • or use the included Botium Box tools to augment the training data in Botium Box and upload it to your Watson Assistant with the Test Case Wizard

When doing it in the Watson Assistant workspace, use the Test Case wizard again to either overwrite the Test Set data or create a new one.

When doing it in Botium, you can use the Botium tools to augment the training data:

  • Use the Test Case Designer to manually add and remove user examples to the training data
  • Use the Paraphraser to generate additional user examples and add them to the training data
  • Merge training data from the included Botium Box data sets, available for various domains in various languages
  • Merge training data from other sources

For example, you can now identify the intents for which there is not enough training data available (less than 10 user examples), and use the Paraphraser to generate more:

Watson - Test Case Wizard - Paraphraser

To upload the additional training data to your Watson Assistant workspace, open the Test Set Dashboard, click on Upload Conversation Model to Chatbot Provider and select your IBM Watson chatbot as registered in the first step of this tutorial. You can now choose to

  • Create a new blank workspace
  • Copy and extend the existing workspace
  • Merge user examples into the existing workspace

All three of them are valid choices, but I would not recommend the third option for workspaces that are currently live, for obvious reasons.

Watson - Test Case Wizard - Conversation Model Uploader

Step 6: Validate Improvements

Again, run a test session with your augmented training data and open it in Botium Coach when ready. In Botium Coach, you can select a secondary test session for performance comparison. It will show your intents and user examples that now are working better or worse than before.

Rasa - Coach Analytics

As usual in Botium Coach, you can drill down to the single user example level to trace any improvements or deteriorations.

Intent Training Progress

Conclusion

While it usually makes no sense to test a machine learning model with data it has been trained on, it can help to visualize issues with the training data itself, such as mismatch probabilities and duplicate user examples. Flaws in the training data will have a negative impact on the overall NLU performance of your chatbot for sure.

Learn more about Cyara Botium today!

Read more about: Chatbots, Cyara Botium, IBM, Tutorial / How-to Guide

Ready for seamless CX assurance?

Learn how Cyara’s AI-led CX productivity, growth, and assurance engine can help you eradicate bad CX.

Speak to an expert
Office view with Cyara dashboard
Office view with Cyara dashboard

Related Posts

conversational AI testing

August 28, 2025

Automated Testing for Conversational AI: A Game-Changer in Customer Support

The rise of AI-powered CX offer many key benefits... and risks. Learn how to ensure CX quality with a conversational AI testing solution.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbots, Conversational AI, Conversational AI Testing, Customer Experience (CX)

chatbot testing services

June 19, 2025

9 Types of Chatbot Testing to Ensure Consistency, Accuracy, and Engagement

Deliver faster, more efficient, and reliable customer interactions by conducting these 9 types of chatbot testing.

Topics: AI Chatbot Testing, Automated Testing, Chatbot Assurance, Chatbot Testing, Chatbots

chatbot testing solution

May 8, 2025

Chatbot Testing Best Practices to Ensure Flawless Customer Support

As contact centers continue to innovate with AI-powered bots, follow these chatbot testing best practices to optimize your CX for success.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Chatbot Testing, Chatbots, Conversational AI, Customer Experience (CX)

Footer

  • AI-Led CX Assurance Platform
    • Cyara AI Trust
    • Cyara Botium
    • Cyara CentraCX
    • Cyara Cloud Migration Assurance
    • Cyara Cruncher
    • Cyara Number Trust
    • Cyara probeRTC
    • Cyara Pulse
    • Cyara Pulse 360
    • Cyara qualityRTC
    • Cyara ResolveAX
    • Cyara testingRTC
    • Cyara testRTC
    • Cyara upRTC
    • Cyara Velocity
    • Cyara Voice Assure
    • Cyara watchRTC
  • Use cases
    • Agent desktop testing
    • Cloud contact center monitoring
    • Contact center number test types
    • Contact center testing
    • Continuous testing
    • Conversational AI testing
    • CX monitoring
    • DevOps for CX
    • Email & SMS testing
    • Functional testing
    • Incident management
    • IVR discovery
    • IVR testing
    • Load & performance testing
    • Omnichannel testing
    • Outbound call testing
    • Regression testing
    • Voice biometrics testing
    • Voice of the customer
    • Voice quality testing
    • Web interaction testing
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • Events & upcoming webinars
    • Resource library
    • On-demand webinars
    • Cyara portal & support site access
    • Customer community
  • About us
    • About Cyara
      • About us
      • Leadership
      • Careers
      • Cyara awards
      • Legal statements, policies, & agreements
    • Services
      • Cyara Academy
      • Consulting services
      • Customer success services
      • Technical support
    • News
      • CEO’s desk
      • Press releases
      • Media coverage
    • Partners
      • Partners
      • Integration & technology partners
      • Platform Integrations
  • LinkedIn
  • Twitter
  • YouTube

Copyright © 2006–2025 Cyara® Inc. The Cyara logo, names and marks associated with Cyara’s products and services are trademarks of Cyara. All rights reserved. Privacy Statement