Speech Recognition

By Peter Lyle DeHaan, PhD

In the simplest of terms, speech recognition is the ability of computers to understand people. For the past 25 years, experts have been predicting that viable speech recognition was about “two years away.” Finally this pronouncement has come to fruition.

Looking back, initial speech recognition systems were speaker-dependent. That meant they would only work for the peoples’ voices for which they were specifically programmed. Users would need to “train” these pioneering systems by repeatedly recording common words or phrases. The computer would compare each sample, determine commonalities, and look for those patterns in future communications. These systems, understandably, had limited vocabularies. The next advance came with speaker-independent systems. Again, these early systems had extremely limited vocabularies as they needed to be programmed to accommodate different pronunciations and accents. Indeed, some accents could never be accurately accommodated in these systems. Also, there needed to be a distinct pause between each word. As such, systems were not able to comprehend words as they are commonly spoken in a sentence, since pauses are normally minimal or essentially non-existent.

Fortunately, these limitations are in the past. Today’s speech recognition systems do not need to be trained to understand your voice, nor do they have limited vocabularies. Plus they are adept at dealing with large variations, be it pronunciation, syllabication, accent, dialect, or even mumbling, and can accommodate continuous speech.

Speech recognition should not be confused with voice recognition (also known as voice authentication). While speech recognition refers to a system that processes and responds to spoken language, voice recognition “refers to identifying or screening a particular person by their voice print,” according to Amcom’s Steve Green. As such, speech recognition is a communication technology and voice recognition is an identification or verification technology.

There are three general classifications of speech recognition applications for the call center:

Alternative to touch-tone: At its most basic level, speech recognition can be used to replace or supplement touch-tone input in an IVR (Interactive Voice Response) or auto-attendant system. This gives callers the ability to press an appropriate key or say the number (which is great for callers without touch-tone phones).
Speech-to-text conversion: An IVR system can answer a call and prompt the caller for information, such as an account number, phone number, or address. The system takes the response, converts it into text, and pre-populates a form or record. This information is then presented to an agent to complete the call. In some situations the entire interaction with the caller is done via IVR and speech recognition. The resulting data is written into a call record, which can then be forwarded to the appropriate individual, department, or even an external computer database.
To access a database: This has the most diverse uses. In this instance, the caller is prompted for information, which is used to access a database. The database could be a directory of pager numbers, phone extensions, or on-call staff. The database could also contain records, such as orders, messages, documents, trouble reports, account balances, payment information, and so forth.

With speech recognition, there are several benefits:

Answers calls on the first ring, 24/7 and never misses a call
Achieves zero hold time
Reduces the number of abandoned calls
Saves money on salary and phone line costs
Enhances traditional touch-tone driven IVR
Automates simple calls
Leaves agents available for more complex calls
Provides the option for self-service
Allows calls to be accurately and quickly self-routed
Ensures consistency in call processing and responses

Steve Green indicated experience has shown that a phased introduction of the technology is the best approach. “This means that rolling out the product in a controlled and methodical process has provided time to adjust and make corrections as needed, resulting in a proven, tested, and fully functional speech application when completely deployed.”

Most implementations of speech recognition software are built on a speech engine, according to Wayne Scaggs, President Alston Tascom. The speech engine is a toolkit that is available for software developers to use in designing their speech recognition applications. It is both common and pragmatic for vendors to use a third-party speech engine. This saves them time and money, allowing for complex results based on proven technology, to be developed at a fraction of the cost than if they were to develop the entire project in-house.

Speech Recognition Vendors

For a list of Speech Recognition vendors who specialize in the Outsourcing Call Center industry, please see our current Speech Recognition Vendor Listing.

Peter Lyle DeHaan, PhD, is the publisher and editor-in-chief of Connections Magazine. He’s a passionate wordsmith whose goal is to change the world one word at a time. Read more of his articles at PeterDeHaanPublishing.com.

[From Connection Magazine – March 2004]