METHOD FOR DETECTING A DIRECTION OF ARRIVAL OF AN ACOUSTIC TARGET SIGNAL

Title:

METHOD FOR DETECTING A DIRECTION OF ARRIVAL OF AN ACOUSTIC TARGET SIGNAL

Document Type and Number:

WIPO Patent Application WO/2024/110036

Kind Code:

Abstract:

The invention discloses a method for detecting a direction of arrival (α) of an acoustic target signal (14) by means of a plurality of microphones (ML1, ML2, MR1, MR2), said microphones (ML1, ML2, MR1, MR2) being distributed over a local device (LD) and a remote device (RD), said local device (LD) comprising at least a first local microphone (ML1) and a second local microphone (ML2), and said remote device (RD) comprising at least a first remote microphone (MR1), each of said microphones (ML1, ML2, MR1, MR2) configured to generate a corresponding microphone signal (xML1, xML2, xMR1, xMR2) from an environment sound (16), respectively, the method comprising the steps of: deriving a first local input signal (Loci ) by means of the first local microphone signal (xML1) and the second local microphone signal (xML2), deriving a second local input signal (Loc2) by means of the first local microphone signal (xML1) and/or the second local microphone signal (xML2), and deriving a first remote input signal (Rem1) by means of at least the first remote microphone signal (xMR1), said first and second local input signal (Loc1, Loc2) and first remote input signal (Rem1) forming a part of a set (18) of input signals, deriving a plurality of spatial feature quantities (Q1, Q2), said spatial feature quantities (Q1, Q2) each being derived from different respective pairs out of the set (18) of input signals, and being indicative of a spatial relation between the two corresponding input signals, using said spatial feature quantities (Q1, Q2) as an input to a neural network (20), and estimating, by means of said neural network (20), the direction of arrival (α) of said acoustic target signal (14).

Inventors:

KAMKAR-PARSI HOMAYOUN (DE)
BOUCHARD MARTIN (CA)
ALTAMIMI SA'DI (CA)

Application Number:

PCT/EP2022/083142

Publication Date:

May 30, 2024

Filing Date:

November 24, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SIVANTOS PTE LTD (SG)
KAMKAR PARSI HOMAYOUN (DE)

International Classes:

H04R3/00; G01S3/808; H04R25/00

Foreign References:

US20220159403A1	2022-05-19
US20210020190A1	2021-01-21

Other References:

VARZANDEH REZA ET AL: "Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks", ICASSP 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 4 May 2020 (2020-05-04), pages 566 - 570, XP033794406, DOI: 10.1109/ICASSP40776.2020.9054754
HAO YIYA ET AL: "Spectral Flux-Based Convolutional Neural Network Architecture for Speech Source Localization and its Real-Time Implementation", IEEE ACCESS, IEEE, USA, vol. 8, 26 October 2020 (2020-10-26), pages 197047 - 197058, XP011819491, DOI: 10.1109/ACCESS.2020.3033533

Attorney, Agent or Firm:

FDST PATENTANWÄLTE (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

FDST Patentanwälte, Nürnberg Seite 21 Claims 1. A method for detecting a direction of arrival ( ^) of an acoustic target signal (14) by means of a plurality of microphones (ML1, ML2, MR1, MR2), said microphones (ML1, ML2, MR1, MR2) being distributed over a local device (LD) and a remote device (RD), said local device (LD) comprising at least a first local microphone (ML1) and a sec- ond local microphone (ML2), and said remote device (RD) comprising at least a first remote microphone (MR1), each of said microphones (ML1, ML2, MR1, MR2) configured to generate a corresponding microphone signal (xML1, xML2, xMR1, xMR2) from an environment sound (16), respectively, the method comprising the steps of: - deriving a first local input signal (Loc1) by means of the first local micro- phone signal (xML1) and the second local microphone signal (xML2), deriving a second local input signal (Loc2) by means of the first local microphone signal (xML1) and/or the second local microphone signal (xML2), and deriving a first re- mote input signal (Rem1) by means of at least the first remote microphone signal (xMR1), said first and second local input signal (Loc1, Loc2) and first remote input signal (Rem1) forming a part of a set (18) of input signals, - deriving a plurality of spatial feature quantities (Q1, Q2), said spatial feature quantities (Q1, Q2) each being derived from different respective pairs out of the set (18) of input signals, and being indicative of a spatial relation between the two cor- responding input signals, - using said spatial feature quantities (Q1, Q2) as an input to a neural network (20), and - estimating, by means of said neural network (20), the direction of arrival ( ^) of said acoustic target signal (14). 2. The method according to claim 1, wherein as a local device (LD), a first hearing instrument (Hy) of a binaural hearing system (1) is used, and as a remote device (RD), a second hearing instrument (Hz) of said binaural hearing system (1) is used. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 22 3. The method according to claim 1 or claim 2, wherein each of said spatial feature quantities (Q1, Q2) is derived from the respec- tive pair out of the set (18) of input signals by the same mathematical relation and/or algorithm, varying the respective pairs of input signals for different spatial feature quantities (Q1, Q2). 4. The method according to claim 3, wherein as said spatial feature quantities (Q1, Q2), corresponding intra-microphone responses (IMR) between the respective two input signals out of the set (18) of in- put signals are derived, each of said intra-microphone responses (IMR) being a function of the cross power spectral densities of the underlying pair of input signals and of an auto power spectral density of one out of the pair of input signals, or a function of the cross-correlations of the underlying pair of input signals and of an auto correlation of one out of the pair of input signals. 5. The method according to any of the preceding claims, wherein the output of the neural network (20) is a vector (v), each vector compo- nent corresponding to a different angular range ( ^ ^j). 6. The method according to claim 5, wherein each vector entry corresponds to a probability of a sound source (6) of the acoustic target signal (14) being present in the respective angular range ( ^ ^j). 7. The method according to any of the preceding claims, wherein the first local input signal (Loc1) is derived from the first and second local microphone signal (xML1, xML2) by means of a beamformer (BF), and/or wherein the second local input signal (Loc2) is generated either from the first local microphone signal (xML1) or from the second local microphone signal (xML2). 8. The method according to claim 7, (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 23 wherein the first local input signal (Loc1) is derived from the first and second local microphone signal (xML1, xML2) using the first local microphone signal (xML1) as a reference for the beamforming performed in said beamformer (BF), and wherein the second local input signal (Loc2) or a third local input signal (Loc3) is derived from the first and second local microphone signal (xML1, xML2) means of another beamformer (BF’) using the second local microphone signal (xML2) as a reference for the beamforming performed in said other beamformer (BF’). 9. The method according to claim 7 or claim 8, wherein the first local input signal (Loc1) is derived from the first and second local microphone signal (xML1, xML2) by means of said beamformer (BF) applying a first target constraint (TarCons1), and/or wherein the or a third local input signal (Loc3), forming part of the set (18) of input signals, is derived from the first and second local microphone signal (xMl1, xML2) by means of another beamformer or said beamformer (BF’) applying a second tar- get constraint (TarCons2) and/or a first noise constraint (NCons1). 10. The method according to any of the preceding claims, wherein said remote device (RD) further comprises a second remote microphone (MR2) configured to generate a second remote microphone signal (xMR2) from the environment sound (16), and wherein said first remote input signal (Rem1) is derived from the first and second remote microphone signal (xMR1, xMR2) by means of a beamformer. 11. The method according to any of the preceding claims, wherein said first remote input signal (Rem1) and/or an auxiliary remote input sig- nal (RemAux), said auxiliary remote input signal (RemAux) forming part of the set (18) of input signals, is derived in the local device (LD) by using at least the first re- mote microphone signal (xMR1) and the first local microphone signal (xML1). 12. The method according to any of the preceding claims, wherein the first remote input signal (Rem1) or the first remote microphone signal (xMR1) is transmitted from the remote device (RD) to the local device (LD), and/or (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 24 wherein the neural network (20) is implemented in the local device (LD). 13. The method according to claim 12 in combination with claim 11, wherein the direction of arrival ( ^) of the acoustic target signal (14) is estimated in the local device (LD), wherein the first local input signal (Loc1) and/or the first local microphone signal (xML1) is transmitted from the local device (LD) to the remote device (RD), wherein a second remote input signal (Rem2) is derived by means of the first and/or the second remote microphone signal (xMR1, xMR2), wherein the direction of arrival ( ^’) of the acoustic target signal (14) is estimated in the remote device (RD) by means of the first and second remote input signal (Rem1, Rem2) and the first local input signal (Loc1) and/or an auxiliary local input signal derived in the remote device by using the first local microphone signal (xML1) and the first remote microphone signal (xMR1), wherein the estimation performed in the remote device (RD) is transmitted to the local device (LD), and wherein a final direction of arrival ( ^) is determined based on the estimation per- formed in the local device (LD) and the estimation performed in the remote device (RD). 14. The method according to any of the preceding claims, wherein the steps of - deriving a first local input signal (Loc1), a second local input signal (Loc2) and a first remote input signal (Rem1) as a part of a set (18) of input signals, - deriving a plurality of spatial feature quantities (Q1, Q2) from respective pairs of input signals, and - using said spatial feature quantities (Q1, Q2) as an input to a neural net- work (20) are performed individually in a plurality of frequency bands. 15. The method according to claim 14, wherein said steps, at least over a frequency range, are performed in non-adjacent frequency bands, and/or up to a frequency of 6 kHz. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 25 16. The method according to any of the preceding claims, wherein as a neural network (20), a deep neural network and/or a recurrent neural network and/or a neural circuit policy and/or a temporal convolution network is used. 17. A binaural hearing system (1) with a first hearing instrument (Hy) and a sec- ond hearing instrument (Hz), said first hearing instrument (Hy) comprising at least a first local microphone (ML1) and a second local microphone (ML2), and said second hearing instrument (Hz) comprising at least a first remote microphone (MR2), said binaural hearing instrument (1) further comprising a neural network (20), and said binaural hearing system (1) being configured to perform the method accord- ing any of claims 2 to 14. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022

Description:

FDST Patentanwälte, Nürnberg Seite 1 P220217P-CT/IR/JB Specification Method for detecting a direction of arrival of an acoustic target signal The invention is related to a method for detecting a direction of arrival of an acous- tic target signal by means of a plurality of microphones, said microphones being distributed over a local device and a remote device, each of said microphones configured to generate a corresponding microphone signal from an environment sound, respectively, the method comprising the steps of deriving a set of input sig- nals, deriving a plurality of spatial feature quantities from the set of input signals, and estimating the direction of arrival of said acoustic target signal on the basis of said spatial feature quantities. In hearing system applications, such as hearing aids configured to correct for a hearing impairment of a user, often directional signal processing is applied to elec- tric input signals derived by microphones of the hearing aid from an environment sound, in order to enhance a target signal and to attenuate noise in the environ- ment sound. The aim is to provide the user with a higher signal-to-noise ratio (SNR). An application of this type of signal processing, however, is not limited to hearing aids, but also communication devices may profit from directional pro- cessing. In order to efficiently enhance a target signal over a noisy background, as a start- ing point, often a direction of arrival (DOA) of the target signal (with respect to a reference direction, e.g., a look direction of the wearer when wearing the hearing system as specified for use by the manufacturer) is estimated. To this end, typi- cally, quantities indicative of the spatial cues of the target signal such as level dif- ferences and/or time differences of the target signal components in the respective microphone signals of the hearing system, are determined or estimated. However, (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 2 in a noisy environment and/or situations with more than one target source, esti- mating the respective target signal components in the microphone signals for a derivation of said level and/or time differences over the different microphone sig- nals is often difficult, in particular in the case of limited processing resources as in (mobile) hearing systems. It is therefore an object of the invention to provide a method for detecting a DOA of an acoustic target signal by means of a plurality of microphones, in particular of a hearing system, that has a high detection/estimation accuracy while being robust against background noise, and preferably being capable of dealing with multiple targets. According to the invention, this object is solved by a method for detecting a direc- tion of arrival of an acoustic target signal by means of a plurality of microphones, said microphones being distributed over a local device and a remote device, said local device comprising at least a first local microphone and a second local micro- phone, and said remote device comprising at least a first remote microphone, each of said microphones configured to generate a corresponding microphone sig- nal from an environment sound. The method comprises the steps of deriving a first local input signal by means of the first local microphone signal and the second local microphone signal, deriving a second local input signal by means of the first local microphone signal and/or the second local microphone signal, and deriving a first remote input signal by means of at least the first remote microphone signal, said first and second local and first remote input signals forming a part of a set of input signals, deriving a plurality of spatial feature quantities, said spatial feature quantities each being derived from different respective pairs out of the set of input signals, and being indicative of a spatial relation between the two corresponding input signals, using said spatial feature quantities as an input to a neural network, and estimating, by means of said neural network, the direction of arrival of said acoustic target signal. Embodi- ments of particular advantage, which may be inventive in their own right, are out- lined in the depending claims and in the following description. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 3 Preferably, the local device and the remote device form part of a hearing system. In this respect, a hearing system is to be understood as any whatsoever system configured to present a sound signal to a hearing of a user by means of at least one electro-acoustic transducer (such as, e.g., a speaker, a balanced metal-case receiver, or a bone conduction transducer). In particular, the hearing system may be given by a binaural hearing system such as a binaural hearing aid configured to correct a hearing impairment of the user and comprising a first and a second hear- ing instrument (each of which to be worn at another ear) as the local and remote devices, or may be given by a communication system with two devices (such as earplug/-pod-like headphones), each of which to be worn at another ear. Each of the microphones distributed over the local and the remote device is con- figured to generate a respective microphone signal from the environment sound. In particular, the first and second local microphone generate a first and second local microphone signal, and the first remote microphone generates a first remote mi- crophone signal. Pre-processing steps such as pre-amplification or A/D-conver- sion may be absorbed into the microphone signals. Preferably, the respective mi- crophone signal represents acoustic pressure oscillations at the location of the un- derlying microphone in corresponding voltage and/or current oscillations. From these microphone signals, a set of input signals is derived. In particular, a first local input signal may be derived from the first and second local microphone signal by means of a beamformer. In particular, the second local input signal may be generated either from the first local microphone signal or from the second local microphone signal, preferably without any signal components from the respective other local microphone signal (i.e., either from the signal components of the first local microphone signal alone, or from the signal components of the second local microphone signal alone). The first remote input signal may be derived from the first remote microphone signal by means of a beamformer, using a second remote microphone signal (generated from the environment sound by a second remote microphone that is comprised in the second remote device), or may be derived from the signal components of the first remote microphone signal alone. The set of (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 4 input signals may consist of the first and second local input signal and the first re- mote signal only, or may comprise further input signals. The advantage of using a beamformer signal as the first local input signal (and possibly, also for the first remote input signal) is that the beamforming allows for a local pre-processing that already may eliminate some noise, in particular direc- tional noise (e.g., from the back hemisphere). The first and second local input signal and the first remote input signal, and possi- bly other input signals, form part of the set of input signals. Now, different pairs of input signals are selected out of the set of input signals, and from each of these pairs, a respective spatial feature quantity is derived, the spatial feature quantity being indicative of a relation between the respective two input signals under con- sideration, so that a plurality of spatial feature quantities is being obtained. In particular, one spatial feature quantity may be derived from the pair of the first and second local input signal, and another spatial feature quantity may be derived from the pair of the first local input signal and the first remote input signal. The spatial feature quantities shall be indicative of a spatial relation, of the respective pair of input signals used to derive each spatial feature quantity. The spatial feature quantities, preferably in an adequate representation such as real and imaginary part or magnitude and phase representation for complex-val- ued quantities, are then used as an input to a neural network, in particular, in the form of an input vector, the vector entries being the spatial feature quantities (or real and imaginary parts thereof). Preferably, the neural network is trained to esti- mate a DOA of the acoustic target signal from the spatial feature quantities as en- tries, i.e., to make predictions from the present spatial feature quantities about a possible DOA, based on “learned” spatial feature quantities of known DOAs in given situations. In particular, the neural network may preferably output a vector with each vector component corresponding to a different angular range as said estimate of the (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 5 DOA. Then, for a proper normalization, each vector entry preferably may corre- spond to a probability of a sound source of the acoustic target signal being present in the respective angular range. However, also other estimates for the DOA are possible as output of the neural network, e.g., an angle or a coarse-grained angle or an angular range of maximum likelihood for the DOA. Preferably, each of said spatial feature quantities is derived from the respective pair out of the set of input signals by the same mathematical relation and/or algo- rithm, varying the respective pairs of input signals for different spatial feature quantities. This means that if a first spatial feature quantity Q1 is derived from the pair of the first local input signal Loc1 and second local input signal Loc2 as a function Q1 = F (Loc1, Loc2), a second spatial feature quantity Q2 is derived from the corresponding pair of the first local input signal Loc1 and the first remote input signal Rem1 as Q2 = F (Loc1, Rem1), i.e., by the same mathematical function F (x, y) of two arguments x and y, changing only (at least) one of the input signals. In an embodiment, as said spatial feature quantities, corresponding intra-micro- phone responses (IMR) between the respective pair of two input signals out of the set of input signals are derived, each of said intra-microphone responses being a function, in particular a ratio, of the cross power spectral densities of the underly- ing pair of input signals and of an auto power spectral density of one out of the pair of input signals, or a function, in particular a ratio, of the cross-correlations of the underlying pair of input signals and of an auto correlation of one out of the pair of input signals. In particular, for the pair of the first local input signal Loc1 (n,k,j) and second local input signal Loc2 (n,k,j) (n being the frame index, j denoting the discrete time sam- ple within the frame n and k being the frequency band index), the IMR (Loc1, Loc2) may be calculated as (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 6 where Loc1* denotes the complex conjugate, and N denotes the number of sam- ples in the subband k for the frame n. The sum in the above equation may also be substituted by a moving average (e.g., with exponential decay coefficients). In an embodiment, the first local input signal and/or the second local input signal and/or a third local input signal is derived from the first and second local micro- phone signal by means of a beamformer applying a first target constraint, and/or wherein the second local input signal or said third local input signal forming part of the set of input signals is derived from the first and second local microphone signal by means of a beamformer applying a second target constraint and/or a first noise constraint. In this respect, a target constraint may be given by an attenuation (de- termined, e.g., by a corresponding attenuation factor) in the respective target di- rection, e.g., an attenuation of 0 dB in the direction of 0° (frontal direction), or in another direction such as +/- 45° or +/- 90°. Likewise, a noise constraint may be given by a null direction for maximum attenuation of noise, e.g., a maximum atten- uation (i.e., a gain of zero) in the direction of 180° (rear direction) or some other di- rection. The absence of a noise constraint in the beamformer for the respective lo- cal input signal corresponds to the assumption of diffuse noise. When multiple lo- cal input signals with different target constraints corresponding to different target directions are used, this may create additional local input signals. The beamformer for deriving the first local input signal from the first and second lo- cal microphone signal then applies a target constraint, such as 0 dB in the direc- tion of 0°. This beamformer may or may not have an additional noise constraint (fixing a null direction). Additional local input signals, such as a third local input sig- nal, may be derived from the first and second local microphone signal by means of beamforming in a similar way as the first local input signal, just varying the direc- tion of the target constraint (e.g., 0 dB at 15°C) and possibly adding a noise con- straint or (if the first local input signal is also generated from a beamformer with a noise constraint), optionally, varying the direction of the noise constraint. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 7 In particular, the first local input signal is derived from the first and second local mi- crophone signal using the first local microphone signal as a reference for the beamforming performed in said beamformer, and the second local input signal or a third local input signal is derived from the first and second local microphone signal by means of another beamformer, using the second local microphone signal as a reference for the beamforming performed in said other beamformer. This means that the set of input signals comprises at least two local input signals derived from the first and second local microphone signal by means of beamforming, each of which has another of said first and second local microphone signal as a reference signal for the beamforming. In an embodiment, said first remote input signal and/or an auxiliary remote input signal, said auxiliary remote input signal forming part of the set of input signals, is derived in the local device by using at least the first remote microphone signal and the first local microphone signal. This means that the first remote input signal may also be generated from the first (and possibly the second) remote microphone sig- nal in the local device. To this end, said first and possibly second remote micro- phone signal is transmitted from the remote device to the local device. In particular, the first remote input signal, which may be derived from the first and second remote microphone signal by means of a beamformer (be it in the local de- vice or in the remote device), is used together with one of the local signals in an- other signal processing operation, preferably in another beamforming operation, e.g., with the first local input signal, in order to generate the auxiliary remote inter- mediate signal. Said auxiliary remote input signal, however, or also a further auxil- iary remote input signal, may be generated by transmitting the first remote micro- phone signal from the remote device to the local device, and applying a signal pro- cessing operation, preferably a beamforming operation, to the first remote micro- phone signal and a local signal, e.g., any of the first and second local microphone signals. Preferably, the first remote input signal or the first remote microphone signal is transmitted from the remote device (in particular, the second hearing instrument) (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 8 to the local device (in particular, the first hearing instrument), and/or the neural network is implemented in the local device. In particular, in case that the neural network is implemented in the local device (the first hearing instrument), the first remote input signal is transmitted from the remote device to the local device. Most preferably, the DoA is used directly in the local device (i.e., in its signal processing unit) for further signal processing (in particular, direction-sensitive signal pro- cessing such as beamforming and the like). In an embodiment, the direction of arrival of the acoustic target signal is estimated in the local device (in particular, the first hearing instrument), wherein the first local input signal and/or the first local microphone signal is transmitted from the local device to the remote device (in particular, the second hearing instrument), wherein a second remote input signal is derived by means of the first and/or the second re- mote microphone signal, wherein the direction of arrival of the acoustic target sig- nal is estimated in the remote device instrument by means of the first and second remote input signal and the first local input signal and/or an auxiliary local input signal, said auxiliary local input signal being derived in the remote device by using the first local microphone signal (and the first remote microphone signal), wherein the estimation performed in the remote device is transmitted to the local device, and wherein a final direction of arrival is determined based on the estimation per- formed in the local device and the estimation performed in the remote device. This comprises that the method may be performed symmetrically, e.g., in the local de- vice by means of the first remote input signal transmitted from the remote device (and the local input signals), and in the remote device by means of the first local input signal transmitted from the local device (and the remote input signals), and a transmission of the DoA estimation obtained in the remote device to the local de- vice, as well as a comparison of the DoA estimation of the remote device with the DoA estimation of the local device. In particular, the final DoA may be an average or a weighted average of said DoA estimations of the local and the remote device. The generation of remote input signals in the remote device may be performed by means of beamforming, in particular using target and/or noise constraints as de- scribed above for the beamforming in the local device. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 9 An important aspect of this embodiment is that for a binaural hearing aid with two hearing devices to be worn by a user at his left ear and right ear, respectively, the hearing device on the left side may detect a DoA of a source located to the left side and near the frontal direction, while the hearing device on the right side may detect a DoA of a source located to the right side and near the frontal direction. This way, each hearing device may predict the DoA of a source essentially not at- tenuated by a head shadowing effect. Since both hearing devices are configured to provide a good DoA estimation for a source located near the frontal direction, the respective estimation in each device may be transmitted to the respective other device for generating a final DoA on each side, based on the two estimations from each device (e.g., by means of a possibly weighted average). In an embodiment, the steps of deriving a first local input signal, a second local in- put signal and a first remote input signal as a part of a set of input signals, deriving a plurality of spatial feature quantities from respective pairs of input signals, and using said spatial feature quantities as an input to a neural network are performed individually in a plurality of frequency bands. In particular, the underlying micro- phone signals each are divided into a plurality of frequency bands for this purpose, wherein the mentioned steps are performed on the microphone signals’ frequency band components. Preferably, estimating the direction of arrival of the acoustic tar- get signal by means of the neural network is performed using the information of all available frequency bands, i.e., in a broadband manner. Hereby, preferably, these steps are performed at least over a given frequency range, in non-adjacent frequency bands, and/or up to a frequency of 6 kHz, prefer- ably 5 kHz, i.e., there exists a frequency range over which the mentioned steps are performed in non-adjacent frequency bands. The frequency bands may have a bandwidth in the order of magnitude of, e.g., 1 kHz, wherein the center frequen- cies of adjacent frequency bands may be 250 Hz apart, e.g. In particular, no fre- quency band with a center frequency of 0 Hz (“DC subband”) is used, and/or only every second frequency band is used up to a frequency of 6 kHz, preferably up to 5 kHz. This way, redundancies are avoided and calculation resources can be used more efficiently. Due to a high overlap between adjacent frequency bands, the (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 10 spatial information in two adjacent frequency bands can be considered similar, so that taking only every second frequency band for performing the aforementioned method of DoA estimation, may be sufficient. Preferably, as a neural network, a deep neural network and/or a recurrent neural network and/or a neural circuit policy and/or a temporal convolution network is used. These types of neural networks are particularly suited for the given task. The invention furthermore discloses a binaural hearing system, in particular a bin- aural hearing aid, with a first hearing instrument and a second hearing instrument, said first hearing instrument comprising at least a first local microphone and a sec- ond local microphone, and said second hearing instrument comprising at least a first remote microphone, said binaural hearing instrument further comprising a neural network, and said binaural hearing system being configured to perform the method described above. The binaural hearing system according to the invention shares the advantages of the method for detecting a DoA according to the invention. Particular assets of the method and of its embodiments may be transferred, in an analogous way, to the binaural hearing system and its embodiments, and vice versa. The attributes and properties as well as the advantages of the invention which have been described above are now illustrated with help of drawings of embodi- ment examples. In detail, figure 1 shows a schematic top view of a binaural hearing aid with two hearing instruments in a hearing situation with a target source, figure 2 shows a block diagram of a method for estimating, by means of the binaural hearing aid’s microphones, a DoA of the target source in the hearing situation of figure 1, (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 11 figure 3 schematically shows the temporal evolution of the raw DoA estimates of the method of figure 2, and the corresponding temporal evolution of a post-processed DoA, figure 4 schematically shows, in a section of a block diagram, the step of spa- tial feature extraction for the method according to figure 2, and figure 5 schematically shows, in a section of a block diagram, the step of spa- tial feature extraction in an alternative embodiment to figure 4. Parts and variables corresponding to one another are provided with the same ref- erence numerals in each case of occurrence for all figures. In figure 1, a schematic top view of a binaural hearing system 1 is shown. The bin- aural hearing system 1 is given by a binaural hearing aid 2, which is worn by a user 4, and comprises a first hearing instrument Hy (worn at the left ear of the user 4 in the present embodiment) and a second hearing instrument Hz (worn at the right ear of the user 4 in the present embodiment). The first hearing instrument comprises two microphones Mfy, Mby, while the second hearing instrument com- prises two microphones Mfz, Mbz (the indices f, b denote “front” and “back” posi- tion in the respective hearing instrument, when properly worn as specified). In the hearing situation depicted in figure 1, a target source 6 given by a target speaker 8 is located in a direction of arrival ^ slightly to the left of a frontal direc- tion 10 of the user 4. The speech 12 of the target speaker 8 constitutes an acous- tic target signal 14 for the binaural hearing aid 2. In order to efficiently enhance this target signal 14, the DoA, i.e., the direction ^ is to be determined by the signal processing of the binaural hearing aid 2. Figure 2 shows a block diagram of a method for estimating the DoA ^ of the acoustic target signal 14 shown in figure 1. As the method may be performed in an essentially symmetrical way in both hearing instruments Hy, Hz, but with a differ- ent signal processing of the signal of the corresponding hearing instrument (e.g. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 12 Hy) and the signal transmitted from the respective other hearing instrument (e.g. Hz), a change in the notation will be introduced for the present embodiment of the method, in which the main signal processing is performed in the first hearing in- strument Hy (worn by the user 4 on his left ear). For the present embodiment of the method, the first hearing instrument Hy will be taken as a local device LD, while the second hearing instrument Hz will be taken as a remote device RD. The two microphones Mfy, Mby of the first hearing device Hy are now denoted as the first local microphone ML1 and the second local microphone ML2, the two mi- crophones Mfz, Mbz of first hearing device Hz are now denoted as the first remote microphone MR1 and the second remote microphone MR2. The first and second local and remote microphones ML1, ML2, MR1, MR2 gener- ate respective first and second local and remote microphone signals xML1, xML2, xMR1, xMR2 from an environment sound 16 comprising the acoustic target signal 14 shown in figure 1 (the acoustic target signal 14 is not shown in figure 2). Each of the first and second local and remote microphone signals xML1, xML2, xMR1, xMR2 are split into a plurality of frequency bands by means of respective filter banks (not shown; e.g., 48-channel filter bank). For the signal processing steps described below, only every second frequency band is used up to a threshold fre- quency of 5 kHz (the DC frequency band is discarded, as well as frequency bands above the threshold frequency). From the first and second local microphone signals xML1, xML2, i.e., from their re- spective sub-band signals, a first local input signal Loc1 is generated in each of the relevant frequency bands by means of a beamformer BF, in a way yet to be described. Furthermore, the sub-band signals of the second local microphone sig- nal xML2 (corresponding to the “back” microphone Mby/ML2 of the local device LD) are taken as a second local input signal Loc2, i.e., in each frequency band that is used, the signal components of the second local microphone signal xML2 are used as the second local input signal Loc2. Finally, from the first and second remote microphone signals xMR1, xMR2, a first remote input signal Rem1 is gen- erated frequency-bandwise in the remote device RD by means of beamforming, in (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 13 a similar way as the first local input signal Loc1, and is transmitted to the local de- vice LD. Then, for the method performed in the local device LD, first and second local input signals Loc1, Loc2 and the first remote input signal Rem1 constitute the set 18 of input signals (for embodiments with further local input signals, these also form part of the set 18 of input signals). For a symmetrical implementation of the signal processing steps implemented in the local device LD as shown in figure 2, the second remote microphone signal xMR2 (not shown; corresponding to the “back” microphone Mbz/MR2 of the remote device RD) may be taken as a second remote input signal Rem2. Then, different pairs of input signals out of the set 18 of input signals are then used for deriving a plurality of corresponding spatial feature quantities Q1, Q2 indic- ative of a spatial relation between the respective input signals involved. As spatial feature quantities Q1, Q2, the so-called intra-microphone responses IMR (c.f. equa- tion (i)) between the two input signals of the respective pair are calculated. To this end, an intra-microphone response IMRLoc1, Loc2 (= Q1) for the pair of the first and second local input signal is calculated, and another intra-microphone response IMRLoc1, Rem1 (= Q2) for the pair of the first local and first remote input signal is cal- culated. These spatial feature quantities Q1, Q2 (represented as a vector q of their respec- tive real and imaginary parts) are then used as an input for a deep neural network (DNN) 20. The DNN 20, e.g., in the present case may have a Recurrent Neural Network (“RNN”) or Network Circuit Policy (“NCP”) or a Temporal Convolutional Network (“TCN”) architecture. The DNN 20 is trained to output a vector v, its entries vj being probabilities for the DoA ^ of the acoustic target signal 14 being located in a certain angular range ^ ^j with respect to the frontal direction 10 in figure 1. The angular ranges ^ ^j prefera- bly may have a width of 5°. Thereby, the angular ranges ^ ^j in the present embod- iment do not fully span the entire space (360°), but rather the frontal 90°-quadrant at the ear at which the local device is worn (i.e., the frontal left quadrant for the (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 14 local device LD being worn at the left ear), plus a certain overshoot (in the order of 10° or 15°). For example, for the local device LD being worn at the left ear, (ii) vj = P ( ^ ^ ^ ^j), ^ ^j = [15° – j · 5° , 15° – (j – 1) · 5°], j = 1…24, with P ( ^ ^ ^ ^j) being the probability of the DoA ^ falling into the particular angular range ^ ^j. Not that most of the angular ranges cover negative angles (the left 90°- quadrant by convention covers negative angles). It is very important noting that the probabilities P ( ^ ^ ^ ^j) are not normalized to a sum of 1 over the angular ranges ^ ^j, since there may be more than one acoustic target signal 14. Rather, the prob- abilities P ( ^ ^ ^ ^j) are to be taken as the confidence of the prediction in each an- gular range ^ ^j, i.e., P ( ^ ^ ^ ^j) = 0 means that the DNN 20 excludes with abso- lute security that any acoustic target signal is present in the corresponding angular range ^ ^j, and P ( ^ ^ ^ ^j) = 1 means that the DNN 20 concludes with absolute security that an acoustic target signal is present in the corresponding angular range ^ ^j. By means of post-processing such as temporal smoothing, the DoA ^ for the estimation performed in the local device LD may be obtained from the vec- tor v. Furthermore, another vector v’ may be transmitted from the remote device RD to the local device (double-dashed arrow). Such a vector v’ may contain estimations of probabilities, performed in the remote device RD, for the DoA ^ falling into a particular angular range ^ ^j in the respective “other” frontal quadrant (i.e., frontal quadrant of the ear at which the remote device is worn, plus an additional over- shoot of 10° to 15°). These estimations are performed in the remote device RD by means of the first local input signal Loc1, the first remote input signal Rem1 and a second remote input signal (not shown), given by the second remote microphone signal xMR2, in an analogous manner to the way described above for the local de- vice LD. For the overlapping region of said two estimations (i.e., ^ 10° to ^ 15° around the frontal direction 10), the respective entries of the vectors v and v’ may be averaged in order to obtain the final estimation result (or the maximum of the (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 15 two values may be kept). The vector v (and possibly the vector v’) may be subject to a post-processing 22 for obtaining, by the local device LD, the final DoA ^ in the frontal hemisphere. The post-processing 22 of the vector v (and possibly the vector v’) may comprise temporal smoothing. Such a post-processing is shown in figure 3. The upper im- age shows the temporal evolution of the direct entries of the vectors v and v’ (in the overlapping region of -15° to +15°, the overlapping entries vj, vj’ are averaged, or their maximum is taken), i.e., the respective probabilities for finding a target sig- nal 14 in the corresponding angular range ^ ^j. In the lower image, temporal smoothing has been applied to said entries of the vectors v and v’, so that the DoA ^ is much better defined. Note that for most of the time t, there are two acoustic target signals present, and hence, not only one DoA ^, but also a second DoA ^2 (corresponding to a possible cross-talk of two speakers close to the user) is recog- nized. At time instants t3, a third speaker also starts talking, such that a third DoA ^3 is recognized. Note that taking into account the vector v’, transmitted from the remote device RD, for the estimation performed in the local device LD is optional. The DoA ^, as well as the second and third DoA ^2, ^3 may vary in the sense that at certain times, the respective DoA is detected in the adjacent angular range ^ ^j ^1. This may be due to movements of the corresponding target signal source (e.g., a slight change of position of a speaker), or also due to a slight change of position/orientation of the user 4. Figure 4 shows a sectional view of a schematical block diagram for the step of spatial feature extraction in the method according to figure 2. The first and second local microphone signals xML1, xML2 are used to generate the first local input sig- nal Loc1, by means of a beamformer BF which uses a first target constraint Tar- Cons1, such as an attenuation of 0 dB at an angle of 0°. As shown in figure 2, fur- thermore, the second local microphone signal xML2 is used as the second local in- put signal Loc2. The first remote input signal Rem1 is generated in the remote de- vice RD in a similar way as the first local input signal Loc1, and is transmitted from the remote device RD to the local device LD. These input signals form the set 18 of input signals. However, the set 18 of input signals also may further comprise a (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 16 third local input signal Loc3, generated from the first and second local microphone signal xML1, xML2 in another beamformer BF’ (dashed box and arrows). Said other beamformer BF’ may use a second target constraint TarCons2, such as an attenuation of 0 dB at an angle of 15° (or at any integer non-zero multiple of 15°, such as 45° or 90°), and possibly, also a first noise constraint NCons1, such as a total attenuation at a given null direction (to be chosen, e.g., out of integer multi- ples of 15° other than the second target constraint TarCons2). Further local input signals, to be constructed by similar beamforming and varying target and/or noise constraints, are possible for the set 18 of input signals. The beamforming in the beamformer BF may be performed as a so-called mini- mum variance distortionless response (MVDR) beamformer with a given “steering” direction θ0 (e.g., the frontal direction). Without loss of generality, this is described below for beamforming with two local microphones, but the method is applicable to an arbitrary number of microphones (local or remote). Then, a constant diffuse noise correlation matrix may be estimated from anechoic head related transfer functions d (f, θ) = (d1, d2) ^T (f, θ) (f being the frequency index) as ^^f^ = ¹ " $ % ^&, '^% ⁽^&, '^. # _# The head related transfer function d1 (f, θ) denotes the transfer function for a sound with frequency f, originating from an angle θ and propagating towards the first local microphone ML1. The sum is performed over a discrete set of a total of Nθ angles which are spanning the entire space. The weights w (f) = (w1, w2) ^T (f) of the beamformer BF in figure 4, i.e., the frequency-bandwise coefficients for the first and second local microphone signal xML1, xML2, can then be derived as with θ0 being the angle corresponding to the first target constraint and µ is a small numerical value for regularization. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 17 Figure 5, as a sectional view of a schematical block diagram, shows an alternative embodiment to the step of spatial feature extraction shown in figure 4. The first and second local microphone signals xML1, xML2 are used to generate the first lo- cal input signal Loc1, by means of a beamformer BF which uses a first target con- straint TarCons1, such as an attenuation of 0 dB at an angle of 0°. In the genera- tion of the first local input signal Loc1, the first local microphone signal xML1 is used as a reference signal. Furthermore, the first and second local microphone signals xML1, xML2 are used to generate the second local input signal Loc2, in a similar way as the first local input signal Loc1 (in particular, with the first target constraint TarCons1). However, now the second local microphone signal xML2 is used as a reference signal. The third local input signal Loc3 is generated, in an analogous way as shown in figure 4, from the first and second local microphone signal xML1, xML2, and using a second target constraint TarCons2 (such as an attenuation of 0 dB at an angle of 45° or 90° or another integer multiple of 15°), and a first noise constraint NCons1. Just like in the generation of the first local input signal Loc1, for the third local input signal Loc3, the first local microphone signal xML1 is used as a refer- ence signal. A fourth local input signal (not shown) might be generated in a similar way as the third local input signal Loc3, but using the second local microphone signal xML1 as a reference signal (compare the relation between the first and sec- ond local input signal Loc1, Loc2 in the embodiment of figure 5). Furthermore, the second local microphone signal xML2 can be used as an addi- tional local input signal LocAd. Just as in the embodiment shown in figure 4, the first remote input signal Rem1 is generated in the remote device RD in a similar way as the first local input signal Loc1, and is transmitted from the remote device RD to the local device LD. The first remote input signal Rem1 itself forms part of the set 18 of input signals. How- ever, the first remote input signal Rem1 is also used together with the first local in- put signal Loc1 to form an auxiliary remote input signal RemAux by means of (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 18 beamforming, preferably using the first remote input signal Rem1 as a reference signal. However, also the first local input signal Loc1 may be taken as reference signal. Further similar auxiliary remote input signals (not shown), to be also part of the set (18) of input signals, may be generated in the local device LD by means of the first remote input signal Rem1 and other local input signals such as the third local input signal (Loc3). Then, the spatial feature extraction is performed on pairs of input signals out of the set (18) of input signals, as shown in figure 4. Even though the invention has been illustrated and described in detail with help of a preferred embodiment example, the invention is not restricted by this example. Other variations can be derived by a person skilled in the art without leaving the extent of protection of this invention. (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 19 Reference numeral 1 binaural hearing system 2 binaural hearing aid 4 user 6 target source 8 target speaker 10 frontal direction 12 speech 14 acoustic target signal 16 environment sound 18 set (of input signals) 20 DNN 22 post-processing BF beamformer BF’ other beamformer Hy, Hz first/second hearing instrument IMR intra-microphone response LD local device Loc1/2 first/second local input signal Loc3 third input signal LocAd additional input signal Mfy, Mby microphones (of the first hearing instrument) Mfz, Mbz microphones (of the second hearing instrument) ML1/2 first/second local microphone MR1/2 first/second remote microphone NCons1 first noise constraint RD remote device Rem1 first remote input signal RemAux auxiliary remote input signal Q1/2 spatial feature quantity q vector (input to the DNN) (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022 FDST Patentanwälte, Nürnberg Seite 20 t time t3 time instant TarCons1/2 first/second target constraint v, v’ vector (output of the DNN) xML1/2 first/second local microphone signal xMR1/2 first/second remote microphone signal ^ DOA ^2, ^3 second/third DoA ^ ^j angular range (\\fs2012\gsi-software\winpat5\document\amt\3754667.docx) letzte Speicherung: 24. November 2022

Previous Patent: APPARTUS FOR ASCENDING ALONG A CABLE WITH BACK SUPPORT PROVIDED WITH REMOVABLE SADDLE AND FOOTREST E...

Next Patent: VEGAN-BASED WHOLE EGG SUBSTITUTE PRODUCT