The Next Level in Performance Management Part 2: Getting Quality Done Right

By Cliff Hurst

To be effective, your Quality Monitoring (QM) must be based on a statistically significant number of sampled calls that are randomly selected, you must have in place monitoring forms and practices that are both valid and reliable, and the results normally need to be approximately distributed before you can infer meaning from them. This article will address these issues. [see part one]

As an industry, we tend to demonstrate our commitment to QM practices by measuring, reporting, and sometimes bragging about how many calls per agent we monitor each month. Measuring performance in this way contributes very little towards your ability to discern overall call center quality.

A Random Sample: If you follow common industry practices, whether you monitor three calls per agent per month or ten calls doesn’t matter. It’s the wrong measure. You need to determine the number of randomly selected calls that should be monitored each month. In a random sample, some agents will be monitored more often than others will. That’s okay. Only by monitoring a random sample can you begin to answer the vital question: “How are we as an organization doing at representing our company to its customers?”

The easiest way to get a random sample is to record all calls and then monitor every nth call. That means you might sample every 400th call or perhaps every 22nd call. The size of the sample depends on how accurately and precisely you wish to measure quality performance. It is important to remember that you cannot make a commitment to quality monitoring by thinking, “We monitor x number of calls per agent per month.” A random sample is required.

How Many Calls to Monitor? The math behind this is a bit complex, but there are a number of sample size calculators available for free on the Web.

Note that statisticians talk about “populations” and “sample size.” For us, population refers to the total calls handled in a certain time frame. Sample size is the number of calls monitored. As an example, let’s imagine an inbound call center that handles 160,000 calls a month (which is the population size).

It is customary to set confidence levels at either 95 percent or 99 percent. Since 95 percent is the looser of the two and the least costly to achieve, let’s start there. Let’s also establish a fairly relaxed margin of error (sometimes called the “confidence interval”) of 5 percent, plus or minus. (For “distribution,” let’s leave it at the default setting of 50 percent for now.)

Enter this into the sample size calculator, and voila! The answer is 384. You will need to monitor 384 randomly selected calls out of the 160,000 in order to be 95 percent confident, plus or minus 5 percent, that the scores attained from the sample actually represent the overall performance of the center during that time.

You may already be monitoring that many calls. If so, the only change you may need to make is to monitor a random sample as opposed to x calls per agent per month.

How Good Is Good Enough?  Is a 95 percent confidence level, plus or minus 5 percent, acceptable? For some organizations, it is; for others, it isn’t. To achieve a confidence level of 99 percent and a margin of error of plus or minus 3 percent, how many calls would you need to monitor? Using the sample size calculator, the answer is 1,653. This is nearly five times as many calls. Are these gains in accuracy and precision worth the extra cost? Only you can decide. Remember, this only holds true only if you are selecting calls randomly. A random sampling is needed to answer the question, “How are we, as an organization, doing at representing our company to its customers?

The Advantage of Large Call Centers: The laws of statistics decidedly favor larger call centers when it comes to attaining accuracy and precision in call monitoring. For example, let’s assume a small call center handles 32,000 calls a month. To achieve a 95 percent certainty, plus or minus a 5 percent margin of error, you will need to monitor 380 calls. By contrast, the center handling 160,000 calls requires that 384 calls be monitored for the same certainty. That’s only four more calls!

Therefore, if you have a small call center, you are going to have to dedicate more resources as a percent of your overall budget to QM to attain the same standards of accuracy and precision as larger centers. Alternately, you may elect to measure quality over longer time periods for instance, quarterly rather than monthly. This is simply a law of mathematical probability.

How to Monitor for Quality: You need to monitor things that actually matter to your clients, to your call center, and, depending on your industry, possibly to regulators. Plus, you must monitor consistently, in a way that is reliable over time and among evaluators.

To be more specific, your monitoring forms have to measure what you say they measure. Plus, they ought to measure what counts. This is called validity. (More on this in a future article.)

Next, you must monitor calls in a way that is reliable over time. In statistics texts, reliability over time is known as test/retest reliability. You want to make sure that if a call handled by an agent was monitored and scored on Monday, a very similar call handled in the same way by the same agent would receive a nearly similar score if it was monitored a week from Friday.

Finally, your quality analysts must do their job in such a way that an agent would receive a similar score no matter who did the rating. Statisticians call this inter-rater reliability; in call centers, we more often call this calibration.

Response Distribution: When your scores are normally distributed and the preceding prerequisites have been met, you can know that, within an established level of confidence and precision, the average, or “mean,” score is a good representation of, “How are we doing?” The revealed meaning of the average score is made even clearer if you also know several other statistical measures that relate to distribution. In a normally distributed sample, if you graph the scores from your sampled calls, they will form the familiar bell-shaped curve.

If your scores are not normally distributed, you still have meaningful data, but the picture is more complicated. You will have to dig a bit deeper to derive meaning from it all. In the absence of normal distribution, the average score doesn’t paint a meaningful picture. You’ll need to weigh some other factors with your analysis. You’ll need to look at the median and the range, and you’ll need to look at whether the distribution is bimodal or not. Also look at the degree and direction it’s skewed, the nature and number of outliers, the standard deviation, and perhaps the quartile or quintile ranges. These statistical measures will shed more light on your data.

In Conclusion: Once you have met all four of the prerequisites described above, you can have a high degree of confidence that you can infer meaning about the whole from the sample. You can answer the question, with a known confidence level and acceptable margin of error, “How well are we, as an organization, doing at representing our company to its customers?” Wouldn’t that be nice to know?

Read part 1 and part 3 in this series.

Cliff Hurst is president of Career Impact, Inc, which he started in 1988. Contact Cliff at 207-499-0141, 800-813-8105, or

[From Connection Magazine May 2008]

This entry was posted in Articles and tagged by Peter DeHaan. Bookmark the permalink.

About Peter DeHaan

Wordsmith Peter DeHaan shares his passion for life and faith through words. Peter DeHaan’s website ( contains information and links to his blogs, newsletter, and social media pages. Peter DeHaan is the president of Peter DeHaan Publishing, Inc., ( the publisher and editor of Connections Magazine and AnswerStat, and editor of Article Weekly.