This scenario may sound familiar: You’ve just upgraded your contact center, but before you can celebrate, there’s one last item on the Project Manager’s implementation plan — a performance load test.
Before you move the applications to production and schedule the actual cutover, you do need to be 110% sure that everything is working as expected. Your Customer Experience scores depend on it.
Often a Project Manager schedules just one test, but it’s actually best practice to schedule two performance load tests with enough time between the tests to correct any defects you find. Here’s why: you may have modified code in one part of the application, but your changes can cause issues in other parts of the application that you didn’t even touch. Also, the call flow from the ingress point to arriving at an agent’s telephone with associated screen-pop data traverses many components in its journey, and the complexity will have hidden defects. These defects will be become exposed in the production system. You can’t predict in advance what these issues will be or where they will occur.
As a Solutions Engineer at Cyara, I’m frequently involved in performance load tests for our customers. Before sitting down to write this blog post, I made a list of the different kinds of issues our Professional Services team has uncovered in various performance load tests over the past 8 years. The list included 45 items, which is much too long to include in this post, so here are just 10 of the most common:
- Telco problems
- Call connect time delays and disconnects
- Busy tones
- IVR prompts incorrect
- Calls incorrectly routed
- IVR application issues
- Garbled audio
- Dead-end call flows
- IVR application issues
- Database related delays
A typical load test scenario
The first performance load test usually uncovers an initial “punch list” of defects and performance issues. Remember that call flows are complex and traverse across many components where defects are hidden. Once these issues are checked off as corrected and your QA testing team is confident that everything is in order, you can run the second test to verify that all systems are go! But sometimes surprises occur, and you may need to run a third performance load test.
Remember, it’s not about how many times you have to run the test, it’s about making sure that you find the issues that end up wrecking your customer experience before your customers do. Defects are a lot cheaper to fix before they end up in the production system.
A customer example
Recently, a customer scheduled two four-hour performance load tests for 4000 concurrent calls with over 3000 virtual agents. As it turned out, unexpected issues required a third test to be executed. The first test showed that the Session Border Controller (SBC) and various Voice Gateways could not handle the call load once it got over 800 concurrent calls. The SBC and Voice Gateway manufacturers provided new software patches and suggested configuration files to correct these issues. With these defects now corrected, the Network team assumed calls would be able to route to the IVR and the ingress infrastructure would not collapse under load. In the second test, the SBC and Voice Gateways worked as expected, but the SIP trunking tables from the Telco were incorrect, and many of the expected calls weren’t able to reach the agents due to the numerous fast busy signals. At this point we were still not able to test the customer call journey from end to end. The project team met with the Telco and they corrected the SIP trunk size tables and ran their own tests.
It was now time to run the third load test and confirm everything was working as designed. Third time’s the charm, as they say.
Now, with the concurrent calls at the required 4000 port scale and running a call rate of 7 CAPS, (Call Attempts Per Second) the third test uncovered several call flows that were routing incorrectly. The ACD routing technician made several call routing changes on the fly, and as the four-hour test progressed, the project team was able to confirm through the real-time reporting that all the elements of the contact center were working as expected. Running ad hoc functional test scripts also provided confirmation.
Let’s recap:
1st Performance Load Test
Result: found that the SBC and Voice Gateways collapsed under load as the concurrent calls ramped up to 4000 ports and issues happened at 800 calls. The production system would have collapsed within the first hour of the day before even getting to the peak busy hour if we hadn’t caught this defect.
2nd Performance Load Test
Result: found that the Telco SIP tables were incorrect and calls never exceeded 1200 calls when the trunk size was assigned to handle 4000 calls. This would have left 2300 agents sitting idle as calls would never arrive to them and customers would get fast busy signals.
3rd Performance Load Test
Result: found several call flows routing incorrectly to the wrong agents or arriving at a dead end. Customers and agents become frustrated when they arrive at the wrong queue, which requires an agent to transfer them to the correct queue, or if they arrive at a dead end with no agent intervention available.
As the project team reviewed the load test reports, they were able to confirm:
- Telco SIP trunk sizing and QOS values were correct
- SBC and Voice Gateways were able to support the 4000 concurrent call load without failure
- The IVR provided the correct prompts and could handle the call load while routing to the correct ACD queues
- The ACD call flows routed the calls to the correct agents and agent groups and provided the correct screen-pop data
- The ACD telephony infrastructure was able to support the 3500 concurrent agents receiving calls
The next morning, the first email of the day from the Project Manager’s laptop had the simple statement, “The Contact Center Platform and its applications are ready for Production – All Tests have Passed!” That’s the message you want to see, no matter how many load tests it takes to get there