|
Speech Recognition
By Peter DeHaan, Ph.D.
March 2004
In
the simplest of terms, speech recognition is the ability of computers to
understand people. For the past 25
years, experts have been predicting that viable speech recognition was about
"two years away." Finally this
pronouncement has come to fruition.
Looking
back, initial speech recognition systems were speaker-dependant.
That meant they would only work for the peoples' voices for which they
were specifically programmed. Users
would need to "train" these pioneering systems by repeatedly recording
common words or phrases. The
computer would compare each sample, determine commonalities, and look for those
patterns in future communications. These
systems, understandably, had limited vocabularies.
The next advance came with speaker-independent systems.
Again, these early systems had extremely limited vocabularies as they
needed to be programmed to accommodate different pronunciations and accents.
Indeed, some accents could never be accurately accommodated in these
systems. Also, there needed to be a
distinct pause between each word. As
such, systems were not able to comprehend words as they are commonly spoken in a
sentence, since pauses are normally minimal or essentially non-existent.
Fortunately,
these limitations are in the past.
Today's
speech recognition systems do not need to be trained to understand your voice,
nor do they have limited vocabularies.
Plus they are adept at dealing with large variations, be it
pronunciation, syllabication, accent, dialect, or even mumbling, and can
accommodate continuous speech.
Speech
recognition should not be confused with voice recognition (also known as voice
authentication). While speech
recognition refers to a system that processes and responds to spoken language,
voice recognition "refers to identifying or screening a particular person by
their voice print," according to Amcom's Steve Green.
As such, speech recognition is a communication technology and voice
recognition is an identification or verification technology.
There
are three general classifications of speech recognition applications for the
call center:
-
Alternative
to touch-tone: At its most basic
level, speech recognition can be used to replace or supplement touch-tone
input in an IVR (Interactive Voice Response) or auto-attendant system. This
gives callers the ability to press an appropriate key or say the number
(which is great for callers without touch-tone phones).
-
Speech-to-text
conversion: An IVR system can
answer a call and prompt the caller for information, such as an account
number, phone number, or address. The
system takes the response, converts it into text, and pre-populates a form
or record. This information is
then presented to an agent to complete the call.
In some situations the entire interaction with the caller is done via
IVR and speech recognition. The
resulting data is written into a call record, which can then be forwarded to
the appropriate individual, department, or even an external computer
database.
-
To
access a database: This has the
most diverse uses. In this
instance, the caller is prompted for information, which is used to access a
database. The database could be
a directory of pager numbers, phone extensions, or on-call staff.
The database could also contain records, such as orders, messages,
documents, trouble reports, account balances, payment information, and so
forth.
With
speech recognition, there are several benefits:
-
Answers
calls on the first ring, 24/7 and never misses a call
-
Achieves
zero hold time
-
Reduces
the number of abandoned calls
-
Saves
money on salary and phone line costs
-
Enhances
traditional touch-tone driven IVR
-
Automates
simple calls
-
Leaves
agents available for more complex calls
-
Provides
the option for self-service
-
Allows
calls to be accurately and quickly self-routed
-
Ensures
consistency in call processing and responses
Steve
Green indicated experience has shown that
a
phased introduction of the technology is the best approach.
"This means that rolling out the product in a controlled and methodical
process has provided time to adjust and make corrections as needed, resulting in
a proven, tested, and fully functional speech application when completely
deployed."
Most
implementations of speech recognition software are built on a speech engine,
according to Wayne Scaggs, President Alston Tascom.
The speech engine is a toolkit that is available for software developers
to use in designing their speech recognition applications.
It is both common and pragmatic for vendors to use a third-party speech
engine. This saves them time and
money, allowing for complex results based on proven technology, to be developed
at a fraction of the cost than if they were to develop the entire project
in-house.
Speech Recognition Vendors
For a list of Speech Recognition vendors who
specialize in the Outsourcing Call Center industry, please see our current
Speech Recognition Vendor
Listing
Return
to List of Articles || Read more articles at MyArticleArchive.com
|