|
A Call for Speech Recognition
By Dan Cropp
May 2007
My father never liked computers.
When asked why, his answer was one heard often about technology: "It's not
user-friendly." Dad wanted his computer to be simple to use, like his
telephone. When I think about automated phone systems and speech recognition
tools, I see the wisdom in my father's words.
Many people dislike automated
phone systems. Some gripe about going through menu after menu, only to be sent
back to the beginning. Others complain that handset keypads can't be used with
such systems. The user usually has to put the phone to their ear to listen to
the prompts, move the phone to read the digits and press the right one, and then
put the phone back to their ear to listen for the next prompt. Many automated
systems don't allow time for these gymnastics, so a call can take several
attempts to complete.
I can't use my cell phone with
such systems. The keypad is so small and hard to hit that I've stopped trying
to use it during a call. To be fair, capturing digits has long been the only
way to reliably retrieve information from callers.
However, speech recognition is
rapidly altering this reality. Speech recognition is generally lumped into
three categories: speaker verification, speaker-dependent, and
speaker-independent.
Speaker verification is
used to verify that a particular person is calling. It's typically used for
security purposes to match a voice to a previously recorded voice.
Speaker-dependent recognition
describes a system that must be trained to recognize an individual voice. Once
trained, the system can recognize what the person is saying, word for word.
This type of recognition is typically found in transcription environments.
Speaker-independent
recognition is used in environments where anyone might call in and the
system must be able to recognize any voice. This type of speech recognition is
the most common in the telephone world. Speaker-independent recognition is
typically command-based. At any time, there is a limited set of commands and
phrases the system expects to hear.
For example, a voicemail system
might ask the caller to verbalize an action for a message they just listened to:
"What would you like to do next?" "Delete this message?" "Play the next
message?" "Logout?" and so on.
Speech recognition began hitting
the mainstream about a decade ago. It came with lots of promises, but it didn't
really deliver. I guess Star Trek led to unreasonable expectations for
such new technology. Early speech recognition systems required the user to
spend hours training it to recognize just their voice. Even then, it had at
best a 97 percent chance of being correct.
I once asked every speech
recognition vendor at COMDEX if their technology would work in a phone system.
The vendors all politely pointed out that a phone call presents many challenges
they hadn't yet solved.
About five years ago, speech
recognition capabilities started appearing in automated phone systems. These
were little more than DTMF systems that had been modified to allow users to say
digits instead of pressing them. These early systems were fun to try, but
offered no significant benefits considering their high costs.
Last week, my home Internet
connection wasn't working. I called my Internet service provider, and an
automated voice said there would be a thirty-minute wait before talking to a
support technician. The voice asked if I would like to try the automated
support system while I waited.
I had work to do, my wife had my
car, and I needed to connect to the network at my office, so I said, "Yes." The
system asked me to describe the problem. "I can't connect to the Internet," I
replied. The system recognized what I said and began asking questions about my
hardware.
After a few questions, the system
prompted me to unplug the modem. I did and said, "The modem is unplugged." The
system said, "You must now wait sixty seconds and plug it back in. To help you,
we will let you know when this time has expired." After exactly a minute, the
system said, "Please plug the modem back in." I did this and said so. The
system told me to wait for the lights on the modem to stop flashing, then power
up my computer, and try connecting to the Internet.
I anxiously opened the Web
browser on my laptop. Much to my surprise, I was on-line. I could connect to
my office and get my work done. In a matter of a few minutes, I went from a
disgruntled customer to one singing the praises of the automated support system.
Besides making calls easier for
callers, there's another compelling reason to consider adopting speech
recognition. Numerous studies show that using a cell phone while driving is
unsafe. Fifteen states and many municipalities have enacted restrictions on
cell phone use while driving. Many more laws affecting cell phone use while at
the wheel are in the works. New York now prohibits drivers from using cell
phones unless they are hands-free devices. California will begin requiring
drivers to use hands-free phones in 2008.
Daily commute times have
increased in recent years and will continue to increase. While commuting, many
of us use cell phones to keep in touch with family and friends and to get work
done during our drive time. These impending cell phone laws will force vendors
to add speech recognition to their phone systems or risk losing business.
Still not convinced that speech
recognition is worth looking at? Then consider how speech recognition could
improve a basic voicemail system. With speech recognition, a voicemail system
could do anything we've become accustomed to doing with email systems.
Obviously, callers would be able
to issue simple voice commands, such as replying, saving, deleting, and
navigating through messages. But voicemail systems could be made to support
advanced commands, such as "Find all messages from John Smith," "Are there any
other messages from this person?" and "Play all messages recorded yesterday." A
speech recognition-enabled messaging system could be tied directly into multiple
email accounts and share the user's contact list with the voice messaging
system, and so much more.
Imagine being able to call a
single number to retrieve all of your messages, regardless of their origin.
Text messages could be played back using text-to-speech translation. Voice
messages, including text and voice attachments, could be played back over the
phone.
Such a system could easily allow
you to issue a command action, such as "Reply," for an email message. Your
voice would be recorded and then either attached to an email as an audio file or
translated into text using a speech-to-text engine.
Another benefit to speech
recognition systems is the ability to support multiple languages. A speech
recognition system could prompt callers to say their preferred language:
"English," "Español," or others. This requires the system to recognize just one
word up front. From this point on, the system would prompt only in the user's
selected language and, more importantly, it would recognize phrases and
pronunciations specific to that language. Multilingual systems require more
thought and input from someone familiar with your system and the languages your
callers speak.
There are some things that
speaker-independent recognition does not do well. At any given time, it uses a
list of words and phrases to compare the detected speech against. If a list
contains multiple words that sound alike, speech recognition has trouble
detecting the differences. For example, a list checking for "Bye" or "Dye" has
trouble with the distinction. So do people. Often, it's better to look for
phrases instead: "Good-bye," "Joseph Dye."
Speech recognition also struggles
with the alphabet. The alphabet consists of short syllables, and it's difficult
to recognize these over the phone. We've become accustomed to saying a letter
and a word that starts with that letter: "D as in Dan, A as in Adam, N as in
Nancy." However, there are ways around this problem. One is to have the system
ask the caller to say and spell the name: "Dan, D A N." In my experience,
this works sometimes, but not every time.
The human brain and voice are
capable of so many combinations that speech recognition will never be able to
totally replace live phone agents. But just as computers were made to handle
redundant tasks and improve human productivity, speech recognition can do the
same thing for phone systems.
Speech recognition isn't a fit
for every scenario. It really comes down to what is needed to handle each
call. Speech recognition should be viewed in the same light as any new
technology. When looking at speech recognition systems, make sure they are
flexible and able to meet your needs. Take time to "kick the tires" and "look
under the hood" of the system. Some processes may need to be tweaked after the
system is up and running.
The first question is, "Will it
be an improvement?" If your answer is "Yes," then ask yourself if the benefits
are worth the cost of the resources required, the set up work, and the training
time.
If your agents gather lots of
personal information from callers, items like phone number, city, state, and
postal code can easily be handled by speech recognition. Items such as names
and addresses are more difficult, so it may be better to have a live agent
handle those.
Speech recognition pricing comes
in several categories: number of voice ports, types of recognition, and
available languages. The type of recognition is dependent on the maximum number
of phrases that need to be recognized at any given time. For example, the
system could ask a caller to say the name of the person in a large directory,
solicit a "yes/no" response, or require a keypad digit.
The number of speech recognition
licenses needed is determined by the number of simultaneous calls to be handled
using speech recognition. If ten callers will be asked questions at the same
time, ten speech recognition licenses would be needed.
If you haven't tried speech
recognition for a few years, it's time for another look. Automated phone
systems have undergone big changes. Dad is probably saying, "It's about time
they made it easy!"
Dan Cropp, a senior software
engineer at Amtelco (www.amtelco.com),
is the primary designer of Amtelco's "Just Say It" speech recognition and
interactive voice response products. He was the principle engineer in the
migration of Amtelco's Infinity Telephone Agent software from the DOS platform
to the Microsoft Windows operating system in the mid-1990s.
Speech Recognition Vendors
For a list of Speech Recognition vendors who
specialize in the Outsourcing Call Center industry, please see our current
Speech Recognition Vendor
Listing
Return
to List of Articles || Read more articles at MyArticleArchive.com
|