biometric_header_voice.jpg (9326 bytes)

 

voice_background.jpg (4215 bytes)

The basis for voice or speech identification technology was pioneered by Texas Instruments in the 1960's (Ruggles, T. 1998).  Since that time, voice identification has undergone aggressive research and development to bring it into mainstream society. 

There are many advantages to using voice identification including:

  • Considered a "natural" biometric technology
  • Provides eyes and hands-free operation
  • Reliability
  • Flexibility
  • Time-saving data input
  • Eliminate spelling errors
  • Improved data accuracy 

 

voice_technology.jpg (4168 bytes)

There are different methods or processes in analyzing one's speech pattern but all voice identification systems are developed within a broader-based speech processing technology.   There have been various leading companies including AT&T, ITT, France Telecom, Bellcore, Texas Instruments, and Siemens that have been actively involved in the development of verification algorithms for voice identification systems.

A voice identification system, like other biometric technologies, requires that a "voice reference template" be constructed so that it can be compared against subsequent voice identifications.  To construct the "reference template" an individual must speak a set phrase several times as the system builds the template.  Voice identification systems incorporate several variables or parameters in the recognition of one's voice/speech pattern including pitch, dynamics, and waveform.

A major concern for voice identification systems is how to account for the variations in one's voice each time a voice identification occurs.  The rate and pitch at which an individual speaks at one moment is not always the same as the next moment in time.   To help eliminate these type of variations during voice identification, a process comprising Hidden Markov Modeling is applied.  The basis of this approach is that the system (software) uses language models to determine how many different words are likely to follow a particular word.  The realized advantage here is that groups words (matching word pools) that sound alike, for example "to", "two", and "too", are drastically reduced and actual words are recognized.  Error rates that use this type of language modeling are from one to 15 percent (Ruggles, T. 1998). 

There are five specific forms of voice identification technologies that are currently available or under development:

1. Speaker Dependent

  • this type of technology involves "training" the system to recognize your speech patterns.  Systems employing this technique can hold a vocabulary of between 30,000 and 120,000 words.  Best if used by a specific user.

2. Speaker Independent

  • this type of voice identification technology can be used by anyone without having to train the system.  As a trade off, the vocabulary is smaller and error rates higher.

3. Discrete Speech Input

  • this environment involves the person speaking to make small pauses, as small as 1/10 of a second, between words.  This allows the system to recognize where words begin and end.

4. Continuous Speech Input

  • users can speak at a continuos rate but the voice identification software can only recognize a limited amount of words and phrases.  This type of technology is also referred to as "word-spotting" systems.  They are called "word-spotting" because a user can be speaking in long sentences or phrases and the system will only recognize predetermined words.

5. Natural Speech Input

  • this is the most desired form of voice identification, but is still under development.  Here the user is able to speak freely and the system is able to interpret and carry out commands on-the-fly.

There are two application methodologies associated with the use of voice identification applications: dedicated hardware and software at the point of access and the dial-up of a PC host using regular phones.

 

voice_applications.jpg (4242 bytes)

Most applications of voice identification today fall under the industries of call-answering and contact-management services.  Other markets that voice verification has penetrated recently include medium-security access control and time and attendance monitoring. 

"Live" applications include:

  • General Motors uses voice identification systems to restrict access to some of its computer rooms
  • Staff at a Chicago hospital are required to pass a voice system to enter the new-born baby unit
  • Immigration and Naturalization Service has implemented voice identification for frequent travelers that cross the Mexican border
  • Martin Marietta, GM, and Hertz are using voice identification technology to protect their computer facilities
  • Private estates all over the world are protected with voice identification technology
  • Used in telephone security-based applications
  • Charles Schwab & Company, Sears & Roebuck and Company, and the United Parcel Service of America Inc. have all implemented voice identification systems for customer service situations
  • Telephone commerce
  • Telephony (hands-free dialing)
  • Used by physicians to record patient data and make records while conducting observations
  • Used by disabled persons
  • Used in the legal profession where legal research can be conducted using voice commands to extract information from WESTLAW and LEXIS-NEXIS database services
  • Voice identification technology is utilized in Automated Identification and Data Capture courses taught at Purdue University.

 

voice_trends.jpg (3958 bytes)

Voice identification technology is still slow to take off in many markets.  One reason is voice identification is not as accurate as other biometric technologies.   For instance, they tend to have a high false reject rate because of background noise and other variables.  This type of disadvantage makes for an insecure system which can alienate it from large markets such as the financial industry and the government operations.

However, voice identification technology continues to grow and improve.  In the future, voice identification will not only be used for text dictation but to open applications and control predetermined commands.  It has also been estimated that if voice identification technology continues to progress as it has, keyboards will become obsolete in ten years.

Microprocessor technology is also set to help voice recognition become more widespread.   With the release of the Pentium III microprocessor by Intel, a new set of instructions enhancing speech recognition was encoded into the microprocessor.  The new set of instructions will help with front-end audio processing and the throughput of the search algorithms invloved in pattern matching.  The enhanced speech recognition instructions will also reduce error rates and response time (21st Century Eloquence, 1999).

It is estimated that revenues from voice/speech identification systems and the telephony equipment and services sold in the United States will increase from $356 million in 1997 to $22.6 billion in 2003 (Smith, L. B. 1998).

Latest News in Voice Recognition Technology

biometric_voice.jpg (21874 bytes)


Biometrics Home (2729 bytes)


Voice Identification Vendors
menu_arrow.jpg (786 bytes)