This article was originally published on QBox’s blog, prior to Cyara’s acquisition of QBox. Learn more about Cyara + QBox.
Yes, it really is that easy to analyze your chatbot in five simple steps!
Cyara helps businesses test, train, and monitor their chatbots through every stage of the development lifecycle.
But seriously, we all know the importance of doing endless rounds of training and testing before we release our chatbots into the real world.
But the problem with NLP training is that when we look at our training data, we’re looking at it from a human perspective. And we’re probably trying to make improvements, tweaks, and additions from a human perspective, too.
But the best way to understand the principles of chatbot performance is to think of it from an NLP point of view.
The job of finetuning our chatbot training data would be made so much easier if we knew the algorithms that the NLP providers use. But the problem is, we don’t, these algorithms are a black box to us chatbot builders, and we’ll never get access to them.
But you aren’t alone. Our solution provides insight into training data, enabling you to make informed decisions about how to develop the performance of your chatbots.
This is done by analyzing and benchmarking the chatbot training data, by visualizing and understanding where it does and doesn’t perform, and why for our specific NLP provider.
This is achieved in five simple steps:
Step 1 – Test
You will first need to download your model file from your chosen provider, and then run your first test in our platform.
This will start the process of analyzing the performance of your training data for your chosen NLP provider.
But there’s more. A standard test will analyze your training data in your model, but there is also the option of doing a cross validation test. Our platform can analyze the performance of data the model has not been trained on.
Both types of tests are just as important as each other: standard testing for assessing the strength of the training data within your chatbot model, and cross validation testing for seeing if there are any gaps in the chatbot’s knowledge and getting an idea of how it will perform in the real world.
Step 2 – Identify
Once the results of the test are revealed, our platform will give three scores for the model as a whole. One each for correctness, confidence, and clarity.
These three scores act as KPIs and will give a good idea of the strength of the model’s performance.
Each intent in the model also has the same three scores. This enables us to easily identify the most poorly performing ones in the model, i.e. the ones with the lowest scores for correctness (first and foremost), confidence, and clarity.
Step 3 – Analyze
Once the poor-performing intents have been identified, you can start the deep analysis by selecting the one with the lowest score for correctness.
This will be the best place to start making improvements, because once you start improving the correctness score of an intent, the confidence and clarity will generally follow suit.
There are a variety of features available to deeply analyze each utterance within the poorly performing intent:
- Diagrams for correctness, confidence, and clarity that display the training data within the intent, in easy-to-understand formats so you can see which utterances are causing confusion and need to be fixed.
- Word Density feature to see quick snapshot of any words in the utterance that may have been over-represented or under-represented, both at intent and global model level.
- Training Data feature with color coding to see at-a-glance any overuse of certain words, phrases and to compare the training data of confused intents side-by-side.
- Explain feature to give a deeper word-by-word analysis of how much influence each word has on the intent prediction.
All of our features will provide the information necessary to understand why the utterance is poorly performing so that the user has the knowledge needed to fix the issue.
Step 4 – Fix
Once you understand where the issues are in your training data, you can make the appropriate changes.
This will be done within your NLP provider.
The changes might include adding new training data to reinforce a concept being expressed within an intent, removing training data that isn’t helping the model whatsoever, moving training data to a better placed intent or a variety of other fixes.
Step 5 – Validate
After you’ve made the fixes to your model, the last step is to run another test in our platform.
Download the fixed model from the NLP provider and then simply run your next test.
The second test will enable you to:
- Validate if the fixes you made were successful.
- Detect what affect your fixes have made on the rest of the model.
Our platform will compare the second test to the first test so you can easily see if the three scores for the intent you’ve tried to fix have improved.
It will also alert you to any improvements or regressions that have affected other intents in the model.
It’s as easy as that, just follow these five simple steps and you’ll soon be on your way to really understanding how your training data is being viewed from the NLP provider point of view.
This will then give you the knowledge to put any necessary fixes in place to improve the performance of your chatbot!