Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VOICE RECOGNITION AND IDENTIFICATION SYSTEM
Document Type and Number:
WIPO Patent Application WO/1988/004463
Kind Code:
A1
Abstract:
Method and apparatus for a voice recognition and identification system. The range of an individuals vocal capabilities (ie physical capabilities relating to voice box, throat, nasal cavities, tongue and lips, etc.) can be covered by five major sounds. It is highly unlikely that two individuals would be able to reproduce identically more than one of these sounds. The present invention compares spoken words covering a plurality of these sounds with stored representations of the words as previously spoken by an individual, in order to identify the individual.

Inventors:
JENNINGS CHRISTOPHER SORREL (GB)
Application Number:
PCT/GB1987/000893
Publication Date:
June 16, 1988
Filing Date:
December 09, 1987
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JENNINGS CHRISTOPHER SORREL (GB)
International Classes:
E05B49/00; G10L15/08; G10L15/10; G10L17/00; (IPC1-7): G10L5/06
Other References:
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, Volume 51, No. 6, Part 2, June 1972, (New York, US), J.J. WOLF, "Efficient Acoustic Parameters for Speaker Recognition", pages 2044-2056.
1978 WESCON TECHNICAL PAPERS, Volume 22, WESTERN ELECTRONIC SHOW AND CONVENTION, ELECTRONIC CONVENTIONS, INC., (El Segundo, California, US), G.R. DODDINGTON, "Voice Authentication Parameters and Usage", pages 28/3-1 - 28/3-6.
Download PDF:
Claims:
CLAIMS
1. A method of voice identification, comprising the steps of providing a plurality of words for enunciation, each respective word being chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, detecting the sounds enunciated, converting the sounds into respective representations of the spoken words, and comparing the representations of the words enunciated with representation of corresponding words previously spoken by a user and stored whereby to identify whether or not the enunciator is the user. _.
2. A method of voice identification in accordance with claim 1, comprising the further steps of selecting the plurality of words from a multiplicity of words and presenting the plurality of words in identifiable form for enunciation, there being provided a store of respective representations of each of said multiplicity of words as previously spoken by said user.
3. A method in accordance with claim 2, wherein the plurality of words is chosen from said multiplicity on a random basis.
4. A method in accordance with claims 2 or 3, wherein said multiplicity of words is divided into a plurality of groups, each respective group containing a plurality of words chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, the selection of the plurality of words being made so as to take at least one word from each group.
5. A method in accordance with any preceding claim, wherein words are presented to cover up to five major sounds arranged to exercise the complete range of vocal capabilities, relating to the voice box, nasal cavities, throat, tongue and lips respectively.
6. Voice identification apparatus comprising means for storing respective representations of each of a plurality of words as spoken by a user, each respective word being chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, means for detecting the sounds of corresponding words as the words are enunciated,means for converting the sounds into respective representations of the spoken words, and means for comparing the representations of the words enunciated with the respective stored representations of corresponding words, whereby to identify whether or not the enunciator is the user.
7. Apparatus in accordance with claim 6, wherein there are stored a respective representation of each of a multiplicity of words as spoken by said user, and there are provided means for selecting a plurality of the multiplicity for presentation for enunciation, and means for presenting the words in identifiable form for enunciation.
8. Apparatus in accordance with claim 7, wherein the means for selecting is arranged to select the plurality of words on a random basis.
9. Apparatus in accordance with claim 7 or 8, wherein the multiplicity of representations of words are stored in a plurality of separate groups, each respective group containing a plurality of representations of words chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, the means for selecting being arranged to select at least one word from each group.
10. Apparatus in accordance with any of claims 6 to 9> wherein the stored representations relate to words arranged to cover up to five major sounds arranged to exercise the complete range of vocal capabilities, relating to the voice box, nasal cavities, throat, tongue and lips, respectively.
11. A security access system, comprising a first door arranged to open in response to a first security check operation, and a second door, access to which is available through the first door, the second door being arranged to open in response to a second security check operation, one of the first or second security check operations being a voice identification process in accordance with any of claims 1 to 5.
12. A security access system in accordance with claim 1, wherein a space between the first and second door is enclosed and the system is arranged to be responsive to a further security check comprising estimating the weight of the entity being checked for access.
Description:
Voice Recognition and Identification System

This invention relates to voice recognition and identification systems.

In certain situations it is necessary to not only detect the presence of a person, but to identify them as a particular individual. By way of example, one such situation is in a security entry system where an individual is not to be admitted to a "secure area" unless he has been identified as a person entitled to admission. It is well known for such security systems to operate by way of a personalised identity card, a unique personal identification number, or even a combination of the two. However, such systems have the disadvantage that they may be compromised if the card is lost or the persona identification number becomes known to a person other than its rightful owner. In the case of a personal identifi¬ cation number, it has the additional disadvantage .that as the number is fixed it must be remembered and that it can be passed on to others. It is therefore desirable to be able to identify an individual by using a more sophisticated recognition means which can recognise an individual's characteristics, such as a voice recognition apparatus.

The accuracy of existing voice recognition apparatus in discriminating between one person and another is of the order of 90$-. Whilst this is respectable, it is

insufficiently accurate for use in security systems.

Using the concepts of phonetics and statistics it is possible to achieve an improved accuracy approaching 100% using the same apparatus.

Spoken words of any language are made up of a combination of any of a multiplicity of different sounds. For example, spoken words of the English language are made up of a combination of any of approximately 50 different sounds. Phonetic notation can be used to classify these sounds. For example,

the sound 5 as in ^lms , the sound O as in a o , the sound at as in cc1 , the sound as in heory, etc. Reference can be made to any good dictionary for a summary of the phonetic notation.

However, there are five major sounds which between them cover the range of an individuals physical capabilities (ie, relating to the voice box, nasal cavities, throat, tongue and lips etc, respectively). Examples of five words which cover major sounds for the English language are as follows: art (voice box predominantly) easy (nasal predominantly) grunt (throat predominantly) lillee(tongue predominantly) away (lips predominantly)

For ease of discussion these sounds will hereinafter be referred to as sounds A to E.

It may be possible for two or more different individuals to each produce one of these sounds identically. However, it is extremely unlikely that any two individuals could reproduce each of the five major sounds A to E identically.

According to the present invention there is provided a method of voice identification, comprising the steps of providing a plurality of words for enunciation, each respective word being chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, detecting the sounds enunciated, converting the sounds into respective representations of the spoken words, and comparing the representations of the words enunciated with representation of corresponding words previously spoken by a user and stored, hereby to identify whether or not the enunciator is the user.

According to the present invention there is further provided voice identification apparatus comprising means for storing respective representations of each of a plurality of words as spoken by a user, each respective word being chosen to produce, when spoken, a particular respective one of a plurality of major sounds for exercising vocal capabilities, means for detecting the sound of corresponding words as the words are enunciated, means for converting the sounds into respective representations of the spoken words, and means for comparing the representations of the words enunciated with the respective stored representations of corresponding words, whereby to identify whether or not the enunciator is the user.

The words provided are preferably multisyllable words having a major sound. Multisyllable words provide more data for a processor than monosyllabic words. Four to five syllables are preferred.

A specific embodiment of the invention will now be described by way of example with reference to the accompanying drawing, in which:-

Figure 1 shows a schematic plan view of a security entry system.

In the illustrated embodiment, the voice recognition/identification system is shown in conjunction with a double-door security access system where only a single-door may be opened at any one time.

The system has a combined memory and processor means 1, which may be for example a computer. Programmed into the system memory is a list of words chosen to produce respective sounds A to E, as hereinbefore defined. For example, the memory may contain a range of words which are divided into sub-ranges, each of which sub-range comprises 500 words containing one of the five sounds A to E, to make a total range of 2500 words.

Each individual who is to be authorised to use the system is required to input into the system a selection of words from each of the sub-ranges. These words may be chosen from the range by the computer 1. For example, the individual may be required to input 50 words from each of the sub-ranges to make 250 words in total. These words may be input by an acousto-electric interface of a known type. A reference file is thus established against which any subsequent attempts to use the system may be referenced.

Referring to the accompanying figure, a double- door security access system is shown, comprising a cubicle 2 which may be entered or left by one of two doors, a first door 3 and a second door 4« As shown, the cubicle 2 may be combined with a wall 5 to provide a means of transferral from a first area 6 on one side of the wall to a second area 7 on the other side. Provided adjacent the first door 3 is a badge reader 8 of known type.

Inside the cubicle 2 there is provided a display means 9 and a microphone 10, linked into the memory and processor means 1. The display means may be any suitabl display such as an alpha-numeric display or a television type monitor.

For an individual to pass from one side of the wall to the other side, the system may operate as follow By starting at area 6 and placing an identity card in th badge reader 8, provided that the card is valid, the first door 3 is opened to allow the individual to enter inside the cubicle 2. Once inside, the door 3 is shut. To provide additional security, the floor of the cubicle may then be weighed and the weight compared against the individual's known weight from his identity card. This is to detect the entry of more than one person into the cubicle. Once inside the cubicle, the memory and processor means 1 consults the individual's pre-recorded reference file and selects a number of words from each of the sub-ranges. For example, one word could be selected from each of the five sub-ranges and be displayed in turn in written form on the display means 9« The individual is required to read and speak each of these words as it appears, in his normal voice, into the microphone 10. Using known techniques, the memory and processor means 1 compare each of the newly spoken words against the corresponding word recorded at the time the individual established his reference file. If each of the newly spoken words compares favourably with that in the individual's reference file, the second door 4 may be opened to allow the individual access to area 7»

If five words are selected and these are spoken and correctly compared against these in the individual's reference file, the chance of that individual being inadvertently admitted is extremely low; The overall accuracy of the system may of course be

improved by the use of a more sophisticated means of comparing the spoken words against the pre-recorded words, as well as by using an improved means for storing the pre-recorded words in the system memory.

It will be appreciated that although the apparatus and method is described in relation to a security entry system, its use is not intended to be limited to such use. The apparatus and method are of general applicability and can be used almost anywhere it is desired to identify personnel and to operate apparatus, which need not necessarily be a door.

The present invention is not limited to the English language. Other languages can be applied as long as words are chosen to cover the respective major sounds which exercise the complete range of vocal capabilities for the language. It is also possible that invented words could be used for the invention. For example, a computer could be arranged to make up a word which produces a particular one of the sounds when spoken. It should be noted that words may be used which are made up of more than one of the major sounds. One word could even cover several of the sounds.

The enunciator being identified need not necessarily be a single individual. It could be any entity.

All five sounds need not necessarily be used for a particular identification. Identification will still be very accurate if words covering only two or three of the sounds are spoken. It is also possible that more than five sounds could be used in some cases, from the available sounds.-

Although multisyllabic words are preferred it is possible that a system could be divised which would operate effectively for monosyllabic words.