The basis for voice or speech identification technology was
pioneered by Texas Instruments in the 1960's (Ruggles,
T. 1998). Since that time, voice identification has
undergone aggressive research and development to bring it into
mainstream society.
There are many advantages to using voice identification
including:
- Considered a "natural" biometric technology
- Provides eyes and hands-free operation
- Reliability
- Flexibility
- Time-saving data input
- Eliminate spelling errors
- Improved data accuracy

There are different methods or processes in analyzing one's
speech pattern but all voice identification systems are developed
within a broader-based speech processing technology. There
have been various leading companies including AT&T, ITT, France
Telecom, Bellcore, Texas Instruments, and Siemens that have been
actively involved in the development of verification algorithms for
voice identification systems.
A voice identification system, like other biometric technologies,
requires that a "voice reference
template" be constructed so that it can be compared against
subsequent voice identifications. To construct the "reference
template" an individual must speak a set phrase several times as the
system builds the template. Voice identification systems
incorporate several variables or parameters in the recognition of
one's voice/speech pattern including pitch, dynamics, and
waveform.
A major concern for voice identification systems is how to
account for the variations in one's voice each time a voice
identification occurs. The rate and pitch at which an
individual speaks at one moment is not always the same as the next
moment in time. To help eliminate these type of variations
during voice identification, a process comprising Hidden Markov Modeling is applied. The
basis of this approach is that the system (software) uses language
models to determine how many different words are likely to follow a
particular word. The realized advantage here is that groups
words (matching word pools) that sound alike, for example "to",
"two", and "too", are drastically reduced and actual words are
recognized. Error rates that use this type of language
modeling are from one to 15 percent (Ruggles,
T. 1998).
There are five specific forms of voice identification
technologies that are currently available or under development:
1. Speaker Dependent
- this type of technology involves
"training"
the system to recognize your speech patterns. Systems
employing this technique can hold a vocabulary of between 30,000
and 120,000 words. Best if used by a specific user.
2. Speaker
Independent
- this type of voice identification
technology can be used by anyone without having to train the
system. As a trade off, the vocabulary is smaller and
error rates higher.
3. Discrete
Speech Input
- this environment involves the person
speaking to make small pauses, as small as 1/10 of a second,
between words. This allows the system to recognize where
words begin and end.
4. Continuous
Speech Input
- users can speak at a continuos rate but
the voice identification software can only recognize a limited
amount of words and phrases. This type of technology is
also referred to as "word-spotting"
systems. They are called "word-spotting" because a user
can be speaking in long sentences or phrases and the system will
only recognize predetermined words.
5. Natural
Speech Input
- this is the most desired form of voice
identification, but is still under development. Here the
user is able to speak freely and the system is able to interpret
and carry out commands on-the-fly.
There are two application methodologies associated with the use
of voice identification applications: dedicated
hardware and software at the point of access and the dial-up of a PC host using regular phones.

Most applications of voice identification today fall under the
industries of call-answering and contact-management services.
Other markets that voice verification has penetrated recently
include medium-security access control and time and attendance
monitoring.
"Live" applications include:
- General Motors uses voice identification systems to restrict
access to some of its computer rooms
- Staff at a Chicago hospital are required to pass a voice
system to enter the new-born baby unit
- Immigration and Naturalization Service has implemented voice
identification for frequent travelers that cross the Mexican
border
- Martin Marietta, GM, and Hertz are using voice identification
technology to protect their computer facilities
- Private estates all over the world are protected with voice
identification technology
- Used in telephone security-based applications
- Charles Schwab & Company, Sears & Roebuck and Company,
and the United Parcel Service of America Inc. have all implemented
voice identification systems for customer service situations
- Telephone commerce
- Telephony (hands-free dialing)
- Used by physicians to record patient data and make records
while conducting observations
- Used by disabled persons
- Used in the legal profession where legal research can be
conducted using voice commands to extract information from WESTLAW
and LEXIS-NEXIS database services
- Voice identification technology is utilized in Automated
Identification and Data Capture courses taught at Purdue
University.

Voice identification technology is still slow to take off in many
markets. One reason is voice identification is not as accurate
as other biometric technologies. For instance, they tend to
have a high false reject rate because of background noise and other
variables. This type of disadvantage makes for an insecure
system which can alienate it from large markets such as the
financial industry and the government operations.
However, voice identification technology continues to grow and
improve. In the future, voice identification will not only be
used for text dictation but to open applications and control
predetermined commands. It has also been estimated that if
voice identification technology continues to progress as it has,
keyboards will become obsolete in ten years.
Microprocessor technology is also set to help voice recognition
become more widespread. With the release of the Pentium III
microprocessor by Intel, a new set of instructions enhancing speech
recognition was encoded into the microprocessor. The new set
of instructions will help with front-end audio processing and the
throughput of the search algorithms invloved in pattern
matching. The enhanced speech recognition instructions will
also reduce error rates and response time (21st
Century Eloquence, 1999).
It is estimated that revenues from voice/speech identification
systems and the telephony equipment and services sold in the United
States will increase from $356 million in 1997 to $22.6 billion in
2003 (Smith,
L. B. 1998).
Latest News in Voice Recognition
Technology |