VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM HAVING PROGRAM STORED THEREON

Title:

VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM HAVING PROGRAM STORED THEREON

Document Type and Number:

WIPO Patent Application WO/2020/246041

Kind Code:

A1

Abstract:

This voice processing device (1) comprises: a first segmenting means (2_1) that divides a first voice into a plurality of first segment voices; a second segmenting means (2_2) which divides a second voice into a plurality of second segment voices; a primary speaker recognition means (3) which calculates a score indicating the degree of similarity between each of a plurality of first and second segment voices; a threshold value calculation means (4) which calculates a threshold value on the basis of a score indicating the degree of similarity between the plurality of first segment voices, from among the plurality of scores calculated by the primary speaker recognition means (3); a speaker clustering means (5) which classifies, into one or a plurality of clusters, a plurality of second segment voices each having a higher degree of similarity than a degree of similarity indicated by a threshold value; and a secondary speaker recognition means (6) which calculates the degree of similarity between the first voice and each of the one or plurality of clusters, and on the basis of the calculation results, determines whether a voice which corresponds to the first voice is contained any of the one or plurality of clusters.

Inventors:

GUO LING (JP)
YAMAMOTO HITOSHI (JP)
KOSHINAKA TAKAFUMI (JP)

Application Number:

PCT/JP2019/022805

Publication Date:

December 10, 2020

Filing Date:

June 07, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NEC CORP (JP)

International Classes:

G10L17/08; G10L17/00; G10L17/02

Foreign References:

JP2000227800A	2000-08-15
JPH0876790A	1996-03-22
JP2005196035A	2005-07-21
JP2019008131A	2019-01-17

Other References:

GREGORY SELLDAVID SNYDERALAN MCCREEDANIEL GARCIA-ROMEROJESUS VILLALBAMATTHEW MACIEJEWSKIVIMAL MANOHARNAJIM DEHAKDANIEL POVEYSHINJI: "Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge", PROC. INTERSPEECH, 2018, pages 2808 - 2812
DAVID SNYDERDANIEL GARCIA-ROMEROGREGORY SELLALAN MCCREEDANIEL POVEYSANJEEV KHUDANPUR: "Speaker recognition for multi-speaker conversations using x-vectors", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP, 2019
JITENDRA AJMERALAIN MCCOWANHERVE BOURLARD: "Robust Speaker Change Detection", IEEE SIGNAL PROCESSING LETTERS, vol. 11, no. 8, 2004, pages 649 - 651
YIN RUIQINGHERVE BREDINCLAUDE BARRAS: "Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks", PROC. INTERSPEECH, 2017, pages 3827 - 3831, XP055716415, DOI: 10.21437/Interspeech.2017-65
See also references of EP 3982360A4

Attorney, Agent or Firm:

IEIRI Takeshi (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: AUTOMOBILE STEERING WHEEL COVER

Next Patent: SURFACE-EMITTING OPTICAL CIRCUIT AND SURFACE-EMITTING LIGHT SOURCE USING SAME