Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM HAVING PROGRAM STORED THEREON
Document Type and Number:
WIPO Patent Application WO/2020/246041
Kind Code:
A1
Abstract:
This voice processing device (1) comprises: a first segmenting means (2_1) that divides a first voice into a plurality of first segment voices; a second segmenting means (2_2) which divides a second voice into a plurality of second segment voices; a primary speaker recognition means (3) which calculates a score indicating the degree of similarity between each of a plurality of first and second segment voices; a threshold value calculation means (4) which calculates a threshold value on the basis of a score indicating the degree of similarity between the plurality of first segment voices, from among the plurality of scores calculated by the primary speaker recognition means (3); a speaker clustering means (5) which classifies, into one or a plurality of clusters, a plurality of second segment voices each having a higher degree of similarity than a degree of similarity indicated by a threshold value; and a secondary speaker recognition means (6) which calculates the degree of similarity between the first voice and each of the one or plurality of clusters, and on the basis of the calculation results, determines whether a voice which corresponds to the first voice is contained any of the one or plurality of clusters.

Inventors:
GUO LING (JP)
YAMAMOTO HITOSHI (JP)
KOSHINAKA TAKAFUMI (JP)
Application Number:
PCT/JP2019/022805
Publication Date:
December 10, 2020
Filing Date:
June 07, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEC CORP (JP)
International Classes:
G10L17/08; G10L17/00; G10L17/02
Foreign References:
JP2000227800A2000-08-15
JPH0876790A1996-03-22
JP2005196035A2005-07-21
JP2019008131A2019-01-17
Other References:
GREGORY SELLDAVID SNYDERALAN MCCREEDANIEL GARCIA-ROMEROJESUS VILLALBAMATTHEW MACIEJEWSKIVIMAL MANOHARNAJIM DEHAKDANIEL POVEYSHINJI: "Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge", PROC. INTERSPEECH, 2018, pages 2808 - 2812
DAVID SNYDERDANIEL GARCIA-ROMEROGREGORY SELLALAN MCCREEDANIEL POVEYSANJEEV KHUDANPUR: "Speaker recognition for multi-speaker conversations using x-vectors", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP, 2019
JITENDRA AJMERALAIN MCCOWANHERVE BOURLARD: "Robust Speaker Change Detection", IEEE SIGNAL PROCESSING LETTERS, vol. 11, no. 8, 2004, pages 649 - 651
YIN RUIQINGHERVE BREDINCLAUDE BARRAS: "Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks", PROC. INTERSPEECH, 2017, pages 3827 - 3831, XP055716415, DOI: 10.21437/Interspeech.2017-65
See also references of EP 3982360A4
Attorney, Agent or Firm:
IEIRI Takeshi (JP)
Download PDF: