PURPOSE: To obtain a synthesized voice of high quality by connecting waveforms smoothly by using a window function and enabling control to a voicing speed with a high degree of freedom in one-pitch units.
CONSTITUTION: The waveform segmentation/superposition part of a voicing speed conversion device determines the window length of the window function (4002 4003) based on a peak position to which the time window function is applied and zero-cross position right before the adjacent peak position right behind the peak position in time series among local peak positions of an input speech extracted by a peak extraction part (4001), and generates the proper window function (4004) according to the window length. Waveform expansion and compression are performed in one-pitch unit by using the generated window function and segmented waveforms are thinned out or repeated according to the time length ratio of the input speech and desired output speech to adjust the voicing speed. Further, when the segmented waveforms are put one over another by using the generated window functions, the end point of the flat position of the window function and the start point of a waveform to be superposed next are synchronized with each other.
MURAKAMI NORIYA