To provide an utterance section detecting device capable of showing a constant performance level, regardless of the noise conditions.
The utterance section detecting device 80 includes a feature value calculation part 102 for calculating two or more kinds of feature values to each frame of speech data; a feature value integrating part 106 which weights the two or more kinds of calculated feature values with respective predetermined weights to calculate an integrated score; an utterance section discriminating part 108, which performs discrimination between an utterance section and a non-utterance section for each frame of the speech data; a reference data storage part 126 and a label file creating part 122, which prepare data 124 with a label, indicating the utterance section and the non-utterance section to each frame; and an initialization control part 130 and a weighting-updating part 128, which use the labelled data 124 as learning data and learn weighting to the two or more kinds of feature values in the feature value integrating part 106 so as to optimize discrimination errors in the utterance section discriminating part 108.
KIDA YUSUKE