• Skip to primary navigation
  • Skip to main content
  • Skip to footer
Cyara

Cyara

Cyara Customer Experience Assurance Platform

  • Login
  • Contact Us
  • Request a demo
  • Login
  • Contact Us
  • Request a Demo
  • Why Cyara
    • AI-Led CX Assurance Platform
    • AI vision for CX
    • Cyara partner network
    • Cyara Academy
  • Solutions
    • Transform
          • TRANSFORM – Drive CX Change

          • Functional, regression, & objective testing | Cyara Velocity
          • Performance testing | Cyara Cruncher
          • See all use cases >
          • Cyara platform - Transform - Drive CX change
    • Monitor
          • MONITOR – Assure CX Journeys

          • CX monitoring | Cyara Pulse
          • Telecom assurance | Cyara Voice Assure
          • CX & telecom monitoring | Cyara Pulse 360
          • Call ID line assurance | Cyara Number Trust
          • Agent environment assurance | Cyara ResolveAX
          • See all use cases >
          • Cyara platform - Monitor - Assure CX journeys
    • Optimize
          • OPTIMIZE — Leverage AI for CX

          • Conversational AI optimization | Cyara Botium
          • Generative AI assurance | Cyara AI Trust
          • See all use cases >
          • Cyara platform - Optimize - Leverage AI for CX
    • Connect
          • CONNECT — Assure WebRTC CX

          • WebRTC optimization | Cyara testRTC
          • WebRTC monitoring | Cyara watchRTC
          • WebRTC quality assurance | Cyara qualityRTC
          • See all use cases >
          • Cyara platform - Connect - Assure WebRTC CX
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • CX use cases
    • Events & upcoming webinars
    • On-demand webinars
    • Resource library
    • Customer community
  • About Us
        • About Cyara

        • About Cyara
        • Leadership
        • Careers
        • Legal statements, policies, & agreements
        • Services

        • Cyara Academy
        • Consulting services
        • Customer success services
        • Technical support
        • News

        • CEO’s desk
        • Press releases
        • Media coverage
        • Cyara awards
        • Partners

        • Partners

Blog / CX Assurance

February 13, 2023

Migrating from LUIS to CLU – Part 2: Migration and Comparison

Alison Houston

Alison Houston, Data Model Analyst

This article was originally published on QBox’s blog, prior to Cyara’s acquisition of QBox. Learn more about Cyara + QBox.


Welcome to part 2 of our blog series, summarizing the changes you’ll experience as you upgrade your Microsoft chatbot from the old LUIS service to the shiny new Cognitive Services for Language (CLU). 

Part 1 of this blog discussed what features are modified or even removed in CLU, and how the interface you build your bot in has changed. We also briefly touched upon the two model training modes available in English—standard and advanced, which we’ll be coming back to in this 2nd part.

In part 2 we’re first going to demonstrate the (very simple!) process of importing your LUIS chatbot into the new CLU service. Then we’ll take the same bot we just imported and use our tool to compare its performance in both LUIS and CLU, while also changing the volume of training data to assess whether CLU’s NLP does indeed require less, as Microsoft has suggested.

Cyara helps businesses like yours optimize chatbot performance with automated testing and monitoring solutions.

LUIS migration van with workers unloading CLU logo and messages

Importing a LUIS Chatbot into CLU

If you’ve already gone through the process of exporting your bot from LUIS as a .json file, then the first half of this process will be very familiar to you. 

Here in fig. 1, we have our demo LUIS chatbot “Bee,” who handles queries about an AI event, answering questions like “What does AI mean?” and “What companies are attending?”

Figure 1: Bee in its original format in LUIS. Note Intents and Entities are in different sections on the side menu

We want to convert Bee into a CLU bot, so to do that we need to export it from LUIS. As shown in fig. 2, when viewing all your apps (bots) on the luis.ai/applications page you can select a bot to export as a .json file with the “Export” drop-down menu.

Figure 2: Exporting the Bee bot as a .json file from LUIS

Now that Bee has been exported as a .json, we go to the language.cognitive.azure.com/clu/projects  page (shown in fig. 3) where we can view all our current CLU projects (bots) and also import new ones.

Figure 3: Importing the Bee .json file into CLU

When you click on “Import” here you’ll be prompted to select a .json file, so just select the one you exported from LUIS and away you go. Here in fig. 4 we can now see Bee as it appears, having just been imported to CLU. Helpfully, the “Entities used with this intent” column shows all the annotated (machine learned) entities that have been tagged in each intent, which LUIS did not show on its equivalent page. We’d encourage you to make a lot more use of these kinds of entities moving forward, as the improved NLP engine underneath CLU could make these kinds of context-sensitive entities more effective with less training data. But also keep in mind that the entity structure is slightly different in CLU, so be sure to check any entities you tagged while in LUIS to make sure their structure is still appropriate for your use case.

Figure 4: Bee after being imported into CLU. Note that intents and entities are now displayed together.

Comparing the Impact of Downsizing Training Data in LUIS/CLU

Now that we have the ability to test CLU models and have just created a version of the same model in both LUIS and CLU, we can start testing the theory that CLU should require less training data to get similar model performance.

We decided to assess this by making a “downsized” version of Bee, where we removed 20% of the training data from each intent at random, and then testing the performance of:

  • Both the original Bee and the downsized Bee
  • First in LUIS, then in CLU with standard training, and then finally in CLU with advanced training
  • This was done for both automated tests (assessing model performance just using the training data) and cross-validation tests (assessing model performance on test data).

We do not know exactly what is different under the hood of CLU, but if CLU is indeed using a more modern, advanced and powerful NLP engine, then we would expect LUIS to exhibit a more extreme drop in performance after reducing the training data. CLU, on the other hand, should be able to better recognise concepts with fewer training examples, and thus experience a smaller drop in performance when training data is reduced.

We also wanted to examine the difference in performance between standard and advanced training, so we’d have a better idea of what the strategy should be when carrying out tests on CLU models in the future.

Results

When the original Bee model and the downsized Bee model were tested using LUIS, CLU standard training and CLU advanced training, the scores came back as the following, summarized in table 1:

Table 1: The 3 scores from testing both the original and downsized Bee in LUIS and both training modes of CLU

The percentage loss from model downsizing was calculated from these scores, and summarised in the graph in fig. 5:

Figure 5: Changes in Bee’s scores when training data volume was reduced.

Results – LUIS

The Bee model performs well under automated testing, since it was already optimized for LUIS, achieving all 3 scores in the 90’s. However, its performance on CV testing is much lower, scoring in the 60’s to low 80’s, as it was never going to be a production bot and we didn’t push the fine-tuning. There are several “weak” concepts within this test data (i.e., not explicitly covered within the training data), which make it harder for LUIS to recognize them.

When Bee is downsized in LUIS, its model score drops an average of 3.6 points, with the largest drop being the automated test correctness, which fell by 7 points. This suggests that LUIS’ predictive accuracy is taking a hit when the volume of training data was reduced.

Results – CLU (standard training)

When the next version of Bee was created using the standard training mode on CLU, confidence remained high, however the correctness scores dropped in the automated tests, and the most significant difference is the extremely poor clarity scores that result from standard training, which require closer examination. 

Fig. 6 shows an example of this from one of the CV tests on a standard-trained CLU version of Bee. If standard training is used, the most probable intents will still be returned as rankings by CLU, but they tend to be extremely close together in confidence. The clarity score represents model stability, sometimes described as “potential for confusion,” and is derived from how far apart the confidence values are for each ranked intent.

Figure 6: An example of very close confidence values in the top 3 intents returned for this utterance in our testing

This tendency towards very similar confidence values appears to be unique to CLU’s standard training mode, and does not emerge when advanced training is used. The results of standard training can still be informative—giving the user a good indication of where errors could arise based on what utterances are mis-classified—but the clarity value is unlikely to be of much use here.

When standard-trained CLU is compared with LUIS, its score totals are poorer. Nevertheless, it is worth noting that the standard-trained version of Bee sees a smaller drop in scores when the model is downsized, compared with LUIS. For example, the automated correctness dropped by 7 points when the model was downsized in LUIS or 8% loss, but it only dropped by 3 points when the model was downsized in standard-trained CLU or less than 4%. This does indeed suggest that CLU is less dependent on large volumes of training data than LUIS is, even when only the basic standard training mode is used.

Results – CLU (advanced training)

When the advanced-trained CLU version of Bee is tested, not only does it consistently return the highest scores of the three model versions, but it also shows only a very small drop in scores when the model was downsized (an average of -0.8 points). The automated clarity score actually increased when the model was downsized, correctness decreased by only 2 points or less than 2% performance. Even the downsized advanced-trained version of Bee in CLU scored better than the full-sized version of Bee in LUIS.

The advanced training in CLU also does not have the issue with low clarity scores that the standard training does, so we can get a clearer picture of which intents are most likely to get confused with one another.

Conclusions

From converting one of our own chatbots into a CLU model, we were able to confirm some details of how we should expect CLU to perform differently from LUIS, and how this will be reflected in our platform. In summation:

  • Importing your LUIS model into CLU is incredibly quick and easy. However, if you use some of the deprecated features like patterns, or make heavy use of overlapping entities, be sure to check Microsoft’s CLU documentation (and part 1 of this blog) to see how your model will be altered in the process.
  • Advanced training on CLU achieves the best scores of the 3 options considered here. Even with 20% less training data.
  • Even when training data was reduced, the advanced-trained CLU model did score better than the full-sized LUIS version of the same model. This suggests that CLU can indeed perform better with less training data.
  • Reducing model size does result in the biggest loss in performance for LUIS, while both the standard and advanced-trained models in CLU demonstrated more stable scores even after the number of training utterances was decreased.
  • Standard training on CLU is quick and free, but correctness and especially clarity scores can drop even when compared with LUIS.

With regards to this final point, we’d still recommend using standard training for most of your CLU tests in order to help identify weak concepts, due to the time and expense of using advanced training and the fact that the findings are the same. A final test should then be carried out with advanced training to get the “true” model score, and then that model would be the version you deploy to live interactions.

Read more about: Chatbot Testing, Cognitive Services for Language, Conversational AI, Microsoft LUIS, Natural Language Processing (NLP), QBox

Ready for seamless CX assurance?

Learn how Cyara’s AI-led CX productivity, growth, and assurance engine can help you eradicate bad CX.

Speak to an expert
Office view with Cyara dashboard
Office view with Cyara dashboard

Related Posts

conversational AI testing

August 28, 2025

Automated Testing for Conversational AI: A Game-Changer in Customer Support

The rise of AI-powered CX offer many key benefits... and risks. Learn how to ensure CX quality with a conversational AI testing solution.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbots, Conversational AI, Conversational AI Testing, Customer Experience (CX)

chatbot testing

July 31, 2025

How Chatbot Testing Helps You Accelerate CX Innovation and Deliver Exceptional Interactions

Poorly optimized chatbots leave you vulnerable to a wide range of risks. Start accelerating CX innovation confidently with chatbot testing.

Topics: AI Chatbot Testing, Artificial Intelligence (AI), Automated Testing, Chatbot Assurance, Chatbot Testing, Contact Centers, Customer Experience (CX)

chatbot testing services

June 19, 2025

9 Types of Chatbot Testing to Ensure Consistency, Accuracy, and Engagement

Deliver faster, more efficient, and reliable customer interactions by conducting these 9 types of chatbot testing.

Topics: AI Chatbot Testing, Automated Testing, Chatbot Assurance, Chatbot Testing, Chatbots

Footer

  • AI-Led CX Assurance Platform
    • Cyara AI Trust
    • Cyara Botium
    • Cyara CentraCX
    • Cyara Cloud Migration Assurance
    • Cyara Cruncher
    • Cyara Number Trust
    • Cyara probeRTC
    • Cyara Pulse
    • Cyara Pulse 360
    • Cyara qualityRTC
    • Cyara ResolveAX
    • Cyara testingRTC
    • Cyara testRTC
    • Cyara upRTC
    • Cyara Velocity
    • Cyara Voice Assure
    • Cyara watchRTC
  • Use cases
    • Agent desktop testing
    • Cloud contact center monitoring
    • Contact center number test types
    • Contact center testing
    • Continuous testing
    • Conversational AI testing
    • CX monitoring
    • DevOps for CX
    • Email & SMS testing
    • Functional testing
    • Incident management
    • IVR discovery
    • IVR testing
    • Load & performance testing
    • Omnichannel testing
    • Outbound call testing
    • Regression testing
    • Voice biometrics testing
    • Voice of the customer
    • Voice quality testing
    • Web interaction testing
  • Resources
    • CX Assurance blog
    • Customer success showcase
    • Events & upcoming webinars
    • Resource library
    • On-demand webinars
    • Cyara portal & support site access
    • Customer community
  • About us
    • About Cyara
      • About us
      • Leadership
      • Careers
      • Cyara awards
      • Legal statements, policies, & agreements
    • Services
      • Cyara Academy
      • Consulting services
      • Customer success services
      • Technical support
    • News
      • CEO’s desk
      • Press releases
      • Media coverage
    • Partners
      • Partners
      • Integration & technology partners
      • Platform Integrations
  • LinkedIn
  • Twitter
  • YouTube

Copyright © 2006–2025 Cyara® Inc. The Cyara logo, names and marks associated with Cyara’s products and services are trademarks of Cyara. All rights reserved. Privacy Statement