To provide the method in which recognition results are speedingly obtained without reducing the recognition precision even though the recognition object voices take many morphs in a linguistic sense.
Voice signals 100 are converted into feature parameter time series by an acoustic analysis section 101, the obtained feature parameter series are recognized as recognition unit series in a phoneme collating section 104 using the recognition unit acoustic models of phonemes, sound syllables/words stored in an acoustic model storage section 102 and the linguistic models related to recognition unit chains and the results are outputted. When the search from a present recognition unit to a next recognition unit is expanded by a starting frame reasonableness discriminating section 105, the propriety of the starting frame of the recognition unit, which is the origin of transition, is discriminated and only if the discrimination is proper, the search for the recognition unit, which is the destination of the transition, is started.
WO/2015/151157 | DEVICE AND METHOD FOR UNDERSTANDING USER INTENT |
WO/1997/037346 | SPEECH PROCESSING |