Speech Recognition Comes of Age

By Chuck Raudonis

Touchtone, also known as DTMF or dual tone multi-frequency, has been the standard of interactive voice response (IVR) for years. Traditional touchtone IVR has its limitations, however. On average, 80 to 85 percent of callers prefer live representatives. DTMF is capable of delivering average self-service rates of 15 to 20 percent. Plus for complex customer interactions, DTMF isn’t even an option.

Changing patterns of telephone usage are also playing a role in keeping DTMF acceptance rates low. For example, listening to instructions on a cell phone while pressing buttons and maneuvering the phone back to your ear in time to listen to the next instruction gets old in a hurry. Fortunately, just as DTMF is showing its age, speech recognition has come of age.

Finally, Technology That Listens to You: Over the past three years, speech recognition IVR has become increasingly effective. Accuracy rates have improved and the processing power needed to support the servers has increased, making complex applications possible. Caller acceptance is strong too, making speech recognition the accepted standard of quality customer care.

The average self-service rate for speech recognition programs is 50 percent and above, which is impressive compared to the 15 percent standard of DTMF. One reason for this difference is that speech recognition can be written in a much more conversational tone than DTMF. In fact, most speech recognition applications written today use what is known as “directed dialogue” – normal conversation with a goal. Questions are asked in a conversational tone, predictable answers are programmed into the system, and the IVR leads the caller through the process much like a live agent would. However, instead of having to fit their response into one of the “Press 1” options, a well-designed speech application anticipates all the expected responses to a prompt and then gathers the data the way the customer understands it.

For recurring interactions, phrasing of questions can be randomized to make each call seem more personal. A unique “persona” can even be created, or auditory cues added, helping to make the voice on the phone “real” without adding time to the call. In fact, speech recognition can seem so real, many customers actually say “goodbye” before hanging up!

Is speech recognition right for your application? Four types of calls lend themselves to speech recognition applications:

  • Information calls. These repetitive inquiries include: store locator calls, travelers checking flight times and delays, retail order status calls, and bank customers checking their balances.
  • Transactional calls. Regardless of how complex your transactions are, they can be adapted for speech recognition. Think about booking flights or hotel rooms, entering market or limit orders on stocks, renewing prepaid accounts, or paying a bill with a credit card. Since speech recognition can upsell and cross-sell, you may even want to test it for your direct response campaigns, especially to handle volume spikes.
  • Calls that don’t lend themselves to a simple menu. Speech recognition makes possible a whole range of services that couldn’t be automated with DTMF. Imagine a caller who wants to know if a particular test or treatment is covered by their health plan. With a speech recognition program capable of understanding words and phrases and alpha-numeric information, you’re not restricted to the 12-button interface any longer.
  • Calls with an alphanumeric input. Despite the widespread use of text messaging, for many people it’s just not easy to input letters using a phone pad. Examples include non-numeric passwords, college course designations, part numbers, reservation confirmation codes and many others. A simple speech recognition application quickly solves this problem.

Choosing your development partner: Speech recognition has such unique challenges that it’s prudent to choose a development partner for your first application, especially considering the many different technologies available and the potential obstacles you may encounter while developing the initial system. Speech technology is more complex than DTMF and there is an up-front investment, which leads some to consider a hosted solution. Hosting allows you to cost-effectively offer customers a speech solution without the need to build up your own speech technology team. A hosted provider maintains the hardware and software, ensures the latest speech software releases are being used, and continually monitors and supports your application. Plus, you only pay for what you use.

You should reasonably expect your development and hosting partner to have documented expertise in the following three areas:

  • Speech recognition IVR,
  • Integrating speech recognition applications with back-end systems and a live contact center,
  • Specific customer service needs of your product of service.

Designing your speech recognition application: To provide a quality experience for your callers, a successful speech recognition application should be built from the ground up, not simply as an existing DTMF application with the response mechanism converted from touch-tone to speech recognition! One way to achieve this is to listen to live representatives interacting with your customers. Because they deal with these calls every day, their input can be invaluable.

Using this information, you can then design the flow of your speech recognition program. Before actually writing it, usability testing must be conducted using simulators or “man behind the curtain” style testing to simulate the application’s interaction with your callers. Then the application flow can be tested (enlisting the support of actual customers), to uncover any glitches and solve application problems before moving onto programming and development.

Once the application is developed, building the test plan is critical – including testing all of the expected and unexpected responses, possible misinterpretations by the system, unrecognizable responses, background noise interference, etc. An experienced speech recognition development partner will work with you to build the appropriate test plan, ensuring that the new application works well, not just in a controlled environment but with live consumers.

Delivering the promised ROI: It’s true that speech recognition is more expensive to deploy than DTMF, in terms of both development costs and IT infrastructure. However, remember that like DTMF, speech recognition is a call-shifting technology. Calls that would otherwise be handled by agents are shifted to automated technology. The sweet spot that makes the ROI of speech recognition so impressive is the reduction in the number of callers who insist on talking with an agent. A well-designed speech application will lead to a greater percentage of callers self-serving than a traditional DTMF application, generating substantial overall cost savings in the long run. In fact, the payback from the larger up-front investment can often be recovered in a matter of months!

Quite frankly, the projected savings generated by speech recognition are likely to leave you speechless. You owe it to your customers and to your bottom line to look at deploying speech recognition.

Chuck Raudonis is Vice President, ICT Global Interactive, of ICT Group, Inc., a provider of interactive voice response, call center, and back-office business process outsourcing solutions with operations in eight countries.

[From Connection Magazine June 2005]

One thought on “Speech Recognition Comes of Age

  1. Pingback: Connections Magazine: The Jun 2005 Issue | Connections Magazine

Leave a Reply