Tutorial: Analyze and Improve IBM Watson Assistant Skill Performance

This article was originally published on Botium’s blog on June 10, 2020, prior to Cyara’s acquisition of Botium. Learn more about Cyara + Botium

Audience: If you have an existing Watson Assistant skill and you want to analyze it for consistency as well as the performance of the training data itself, you should read this article. It shows ways to improve the performance of your skill as well. We will use Botium to:

download the training data from your Watson Assistant skill to Botium
run static and dynamic analytics on it
present the results in the Botium Coach dashboard
augment the training data to improve the performance
upload the training data from Botium to your Watson Assistant skill
and finally, validate the improvements

Attention: In machine learning, it typically makes no sense to use training data for testing, but using the outlined approach you will detect any serious flaws in the training data itself. It is not possible to tell how your skill will work out in production later, for this you have to invest some more effort to prepare good test data (or use the data sets included in Botium).

Step 1: Open a Channel to your Watson Assistant Skill

Open the Chatbots menu in Botium, click the Register New Chatbot button and select IBM Watson Assistant API in the technology selection field. You can find everything you need in your IBM Cloud Console – in most cases you will authenticate with an IAM API Key – see the IBM docs on how to get yours. It is important to select Assistant V1 as SDK Version.

Watson - Register Chatbot — Register Chatbot in Botium Box

When everything is in place, click the Say Hello button to verify your credentials.

Step 2: Use the Test Case Wizard to Download Training Data to Botium Box

Open the Test Case Wizard, expand the Conversation Model Downloader section, enter a name for the new Test Set and click on Download from IBM Watson Assistant. Make sure the Create new Test project with this Test Set is enabled to save you a few extra clicks later.

Car Dashboard Training Data - Test Case Wizard - Conversation Model Uploader

At this point, Botium will download the intents and user examples from your skill and build a Test Set in Botium out of it. It will do a static analysis of the training data as well – in case it identifies obvious problems (such as duplicate utterances, empty user examples list…) it will immediately warn you about it. You can now see the Test Set statistics.

Car Dashboard Training Data - Test Set Dashboard

By switching to the Test Cases tab, you should now see something familiar – the intent list as you named it in your Watson Assistant skill, as well as the user examples for them!

Car Dashboard Training Data - Test Cases

Step 3: Run a First Test Session

In the Test Set Dashboard, click the Start Test Session button and watch the test session progress. Botium will send all of your training data one after the other to the Watson Assistant skill and notice any irregularities – for example, if a user example resolves to another intent as expected.

Attention: Depending on the Watson Assistant plan in place you will have to pay for each API call!

Car Dashboard Training Data - Test Session

Step 4: Open the results in Botium Coach Dashboard

In Botium Coach Dashboard you will now receive some hints about what might be wrong with your training data. You won’t get much out of the confidence score evaluations, as we are testing with training data, so anything but a very high average confidence score would be really surprising (or alarming).

Car Dashboard Training Data - Coach Dashboard

You can have a look a the confusion matrix, but for the same reason as above, you won’t really get valuable hints there.

You definitely should have a look into the Mismatch Probability Risks section – this is about user examples with a high risk of matching an incorrect intent, which usually would trigger disambiguation in Watson Assistant (if this feature is enabled) – this shouldn’t happen for training data. Botium Coach evaluates the alternate intents lists to show you any intents and user examples that are very likely to cause a mismatch due to similar confidence scores. In the example below you can see that the user examples “good day”, “can I teach you” and some more have a mismatch probability score of 1, which means that Watson Assistant is not able to distinguish between two intents.

Car Dashboard Training Data - Mismatch Probabilty Risks

Clicking inside the chart will show the alternate intents list Watson Assistant returned for this user example – so there is definitely something odd with your training data – most likely, the user example “good day” is included in both of the intended user examples.

Predicted intents of utterence 'good day'

Other valuable insights you can test in the Intent Mismatch Probability Risks, Alternative Intents list – this will highlight any intents that are often appearing in the alternate intents lists of the other on a rather prominent position.

Top 100 Intent Mismatch Probability Risks, Alternative Intents

Step 5: Augment Training Data

You can now choose to:

augment your training data in your Watson Assistant workspace and then download the augmented training data to Botium Box again
or use the included Botium Box tools to augment the training data in Botium Box and upload it to your Watson Assistant with the Test Case Wizard

When doing it in the Watson Assistant workspace, use the Test Case wizard again to either overwrite the Test Set data or create a new one.

When doing it in Botium, you can use the Botium tools to augment the training data:

Use the Test Case Designer to manually add and remove user examples to the training data
Use the Paraphraser to generate additional user examples and add them to the training data
Merge training data from the included Botium Box data sets, available for various domains in various languages
Merge training data from other sources

For example, you can now identify the intents for which there is not enough training data available (less than 10 user examples), and use the Paraphraser to generate more:

To upload the additional training data to your Watson Assistant workspace, open the Test Set Dashboard, click on Upload Conversation Model to Chatbot Provider and select your IBM Watson chatbot as registered in the first step of this tutorial. You can now choose to

Create a new blank workspace
Copy and extend the existing workspace
Merge user examples into the existing workspace

All three of them are valid choices, but I would not recommend the third option for workspaces that are currently live, for obvious reasons.

Watson - Test Case Wizard - Conversation Model Uploader

Step 6: Validate Improvements

Again, run a test session with your augmented training data and open it in Botium Coach when ready. In Botium Coach, you can select a secondary test session for performance comparison. It will show your intents and user examples that now are working better or worse than before.

As usual in Botium Coach, you can drill down to the single user example level to trace any improvements or deteriorations.

Conclusion

While it usually makes no sense to test a machine learning model with data it has been trained on, it can help to visualize issues with the training data itself, such as mismatch probabilities and duplicate user examples. Flaws in the training data will have a negative impact on the overall NLU performance of your chatbot for sure.

Learn more about Cyara Botium today!

Read more about: Chatbots, Cyara Botium, IBM, Tutorial

Tutorial: Analyze and Improve IBM Watson Assistant Skill Performance

Step 1: Open a Channel to your Watson Assistant Skill

Step 2: Use the Test Case Wizard to Download Training Data to Botium Box

Step 3: Run a First Test Session

Step 4: Open the results in Botium Coach Dashboard

Step 5: Augment Training Data

Step 6: Validate Improvements

Conclusion

5 Industries that Benefit from Chatbot Implementation

Bot Misuse Can Quickly Go from Humorous to Harmful

Cyara Xchange 2024 Highlights: Inspiring Innovation & Enhancing CX

Step 1: Open a Channel to your Watson Assistant Skill

Step 2: Use the Test Case Wizard to Download Training Data to Botium Box

Step 3: Run a First Test Session

Step 4: Open the results in Botium Coach Dashboard

Step 5: Augment Training Data

Step 6: Validate Improvements

Conclusion

Subscribe for Updates

5 Industries that Benefit from Chatbot Implementation

Bot Misuse Can Quickly Go from Humorous to Harmful

Cyara Xchange 2024 Highlights: Inspiring Innovation & Enhancing CX

Footer

5 Industries that Benefit from Chatbot Implementation