1
|
Are frog calls relatively difficult to locate by mammalian predators? J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:11-30. [PMID: 36508005 DOI: 10.1007/s00359-022-01594-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/26/2022] [Accepted: 11/08/2022] [Indexed: 12/14/2022]
Abstract
Frogs call in acoustically dense choruses to attract conspecific females. Their calls can potentially reveal their location to predators, many of which are mammals. However, frogs and mammals have very different acoustic receivers and mechanisms for determining sound source direction. We argue that frog calls may have been selected so that they are harder to locate with the direction-finding mechanisms of mammals. We focus on interaural time delay (ITD) estimation using delay-line coincidence detection (place code), and a binaural excitatory/inhibitory (E/I) ITD mechanism found in mammals with small heads (population code). We identify four "strategies" which frogs may employ to exploit the weaknesses of either mechanism. The first two strategies used by the frog confound delay estimation to increase direction ambiguity using highly periodic calls or narrowband calls. The third strategy relies on using short pulses. The E/I mechanism is susceptible to noise with sounds being pulled to the medial plane when signal-to-noise ratio is low. Together, these three strategies compromise both ongoing and onset determination of location using either mechanism. Finally, frogs call in dense choruses using various means for controlling synchrony, maintaining chorus tenure, and abruptly switching off calling, all of which serve to confound location finding. Of these strategies, only chorusing adversely impacts the localization performance of frogs' acoustic receivers. We illustrate these strategies with an analysis of calls from three different frog species.
Collapse
|
2
|
Gerhardt HC, Bee MA, Christensen-Dalsgaard J. Neuroethology of sound localization in anurans. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:115-129. [PMID: 36201014 DOI: 10.1007/s00359-022-01576-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 08/23/2022] [Accepted: 09/22/2022] [Indexed: 02/07/2023]
Abstract
Albert Feng pioneered the study of neuroethology of sound localization in anurans by combining behavioral experiments on phonotaxis with detailed investigations of neural processing of sound direction from the periphery to the central nervous system. The main advantage of these studies is that many species of female frogs readily perform phonotaxis towards loudspeakers emitting the species-specific advertisement call. Behavioral studies using synthetic calls can identify which parameters are important for phonotaxis and also quantify localization accuracy. Feng was the first to investigate binaural processing using single-unit recordings in the first two auditory nuclei in the central auditory pathway and later investigated the directional properties of auditory nerve fibers with free-field stimulation. These studies showed not only that the frog ear is inherently directional by virtue of acoustical coupling or crosstalk between the two eardrums, but also confirmed that there are extratympanic pathways that affect directionality in the low-frequency region of the frog's hearing range. Feng's recordings in the midbrain also showed that directional information is enhanced by cross-midline inhibition. An important contribution toward the end of his career involved his participation in neuroethological research with a team of scientists working with frogs that produce ultrasonic calls.
Collapse
Affiliation(s)
- H Carl Gerhardt
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA.
| | - Mark A Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota-Twin Cities, 1479 Gortner Ave, St. Paul, MN, 55108, USA
- Graduate Program in Neuroscience, University of Minnesota-Twin Cities, 321 Church Street SE, Minneapolis, MN, 55455, USA
| | | |
Collapse
|
3
|
Yost WA. Spatial release from masking based on binaural processing for up to six maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2093. [PMID: 28372135 PMCID: PMC5848840 DOI: 10.1121/1.4978614] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 02/24/2017] [Accepted: 02/28/2017] [Indexed: 05/27/2023]
Abstract
Spatial Release from Masking (SRM) was measured for identification of a female target word spoken in the presence of male masker words. Target words from a single loudspeaker located at midline were presented when two, four, or six masker words were presented either from the same source as the target or from spatially separated masker sources. All masker words were presented from loudspeakers located symmetrically around the centered target source in the front azimuth hemifield. Three masking conditions were employed: speech-in-speech masking (involving both informational and energetic masking), speech-in-noise masking (involving energetic masking), and filtered speech-in-filtered speech masking (involving informational masking). Psychophysical results were summarized as three-point psychometric functions relating proportion of correct word identification to target-to-masker ratio (in decibels) for both the co-located and spatially separated target and masker sources cases. SRM was then calculated by comparing the slopes and intercepts of these functions. SRM decreased as the number of symmetrically placed masker sources increased from two to six. This decrease was independent of the type of masking, with almost no SRM measured for six masker sources. These results suggest that when SRM is dependent primarily on binaural processing, SRM is effectively limited to fewer than six sound sources.
Collapse
Affiliation(s)
- William A Yost
- Speech and Hearing Science, Arizona State University, P.O. Box 870102, Tempe, Arizona 85287, USA
| |
Collapse
|
4
|
Clark B, Flint JA. Acoustical Direction Finding with Time-Modulated Arrays. SENSORS 2016; 16:s16122107. [PMID: 27973432 PMCID: PMC5191087 DOI: 10.3390/s16122107] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Revised: 11/29/2016] [Accepted: 12/06/2016] [Indexed: 11/29/2022]
Abstract
Time-Modulated Linear Arrays (TMLAs) offer useful efficiency savings over conventional phased arrays when applied in parameter estimation applications. The present paper considers the application of TMLAs to acoustic systems and proposes an algorithm for efficiently deriving the arrival angle of a signal. The proposed technique is applied in the frequency domain, where the signal and harmonic content is captured. Using a weighted average method on harmonic amplitudes and their respective main beam angles, it is possible to determine an estimate for the signal’s direction of arrival. The method is demonstrated and evaluated using results from both numerical and practical implementations and performance data is provided. The use of Micro-Electromechanical Systems (MEMS) sensors allows time-modulation techniques to be applied at ultrasonic frequencies. Theoretical predictions for an array of five isotropic elements with half-wavelength spacing and 1000 data samples suggest an accuracy of ±1∘ within an angular range of approximately ±50∘. In experiments of a 40 kHz five-element microphone array, a Direction of Arrival (DoA) estimation within ±2.5∘ of the target signal is readily achieved inside a ±45∘ range using a single switched input stage and a simple hardware setup.
Collapse
Affiliation(s)
- Ben Clark
- Wolfson School of Mechanical, Electrical and Manufacturing Engineering, Loughborough University, Leicestershire LE11 3TU, UK.
| | - James A Flint
- Wolfson School of Mechanical, Electrical and Manufacturing Engineering, Loughborough University, Leicestershire LE11 3TU, UK.
| |
Collapse
|
5
|
Lee JC, Nam KW, Jang DP, Kim IY. A Diagonal-Steering-Based Binaural Beamforming Algorithm Incorporating a Diagonal Speech Localizer for Persons With Bilateral Hearing Impairment. Artif Organs 2015; 39:1061-8. [DOI: 10.1111/aor.12488] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jun Chang Lee
- Department of Biomedical Engineering; Hanyang University; Seoul Korea
| | - Kyoung Won Nam
- Department of Biomedical Engineering; Hanyang University; Seoul Korea
| | - Dong Pyo Jang
- Department of Biomedical Engineering; Hanyang University; Seoul Korea
| | - In Young Kim
- Department of Biomedical Engineering; Hanyang University; Seoul Korea
| |
Collapse
|
6
|
Escolano J, Xiang N, Perez-Lorenzo JM, Cobos M, Lopez JJ. A Bayesian direction-of-arrival model for an undetermined number of sources using a two-microphone array. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:742-753. [PMID: 25234883 DOI: 10.1121/1.4861356] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Sound source localization using a two-microphone array is an active area of research, with considerable potential for use with video conferencing, mobile devices, and robotics. Based on the observed time-differences of arrival between sound signals, a probability distribution of the location of the sources is considered to estimate the actual source positions. However, these algorithms assume a given number of sound sources. This paper describes an updated research account on the solution presented in Escolano et al. [J. Acoust. Am. Soc. 132(3), 1257-1260 (2012)], where nested sampling is used to explore a probability distribution of the source position using a Laplacian mixture model, which allows both the number and position of speech sources to be inferred. This paper presents different experimental setups and scenarios to demonstrate the viability of the proposed method, which is compared with some of the most popular sampling methods, demonstrating that nested sampling is an accurate tool for speech localization.
Collapse
Affiliation(s)
| | - Ning Xiang
- Graduate Program in Architectural Acoustics, School of Architecture, Rensselaer Polytechnic Institute, Troy, New York 12180
| | - Jose M Perez-Lorenzo
- Multimedia and Multimodal Processing Research Group, University of Jaén, 23700, Linares, Spain
| | - Maximo Cobos
- Computer Science Department, University of Valencia, 46100, Burjassot, Spain
| | - Jose J Lopez
- Institute for Telecommunication and Multimedia Applications, Universidad Politécnica de Valencia, 46021, Valencia, Spain
| |
Collapse
|
7
|
Side peak suppression in responses of an across-frequency integration model to stimuli of varying bandwidth as demonstrated analytically and by implementation. J Comput Neurosci 2013; 36:1-17. [PMID: 23715909 DOI: 10.1007/s10827-013-0460-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Revised: 04/24/2013] [Accepted: 05/01/2013] [Indexed: 10/26/2022]
Abstract
Multiplication-like sound localization models are subjected to phase ambiguities for high-frequency tonal stimuli as multiplication creates several equivalent response peaks in tuning curves. By increasing the bandwidth of the stimulus, phase ambiguities can be reduced, which is often referred to as side peak suppression. In this study we present a Jeffress-based sound localization model, and determine side peak suppression analytically. The results were verified with an implementation of the same model, and compared to physiological data of barn owls. Three types of stimuli were analyzed: pure-tone stimuli, two-tone complexes with varying frequency distances, and noise signals with variable bandwidths. As an additional parameter we also determined the half-width of the main response peak to examine the scaling of tuning curves in azimuth. Results showed that side peak suppression did not only depend on bandwidth, but also on the center frequency and the distance of the side peak to the main response peak. In particular, the analytical model predicted that side peak suppression is a function of relative bandwidth, whereas half-width is inversely proportional to center frequency, with a proportionality factor depending on relative bandwidth. The analytical approach and the implementation yielded equivalent tuning curves (deviation < 1%). Moreover, the electrophysiological data recorded in barn owls closely matched the predicted tuning curves.
Collapse
|
8
|
Yost WA, Brown CA. Localizing the sources of two independent noises: role of time varying amplitude differences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2301-13. [PMID: 23556597 PMCID: PMC3631260 DOI: 10.1121/1.4792155] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 01/07/2013] [Accepted: 01/24/2013] [Indexed: 05/25/2023]
Abstract
Listeners localized the free-field sources of either one or two simultaneous and independently generated noise bursts. Listeners' localization performance was better when localizing one rather than two sound sources. With two sound sources, localization performance was better when the listener was provided prior information about the location of one of them. Listeners also localized two simultaneous noise bursts that had sinusoidal amplitude modulation (AM) applied, in which the modulation envelope was in-phase across the two source locations or was 180° out-of-phase. The AM was employed to investigate a hypothesis as to what process listeners might use to localize multiple sound sources. The results supported the hypothesis that localization of two sound sources might be based on temporal-spectral regions of the combined waveform in which the sound from one source was more intense than that from the other source. The interaural information extracted from such temporal-spectral regions might provide reliable estimates of the sound source location that produced the more intense sound in that temporal-spectral region.
Collapse
Affiliation(s)
- William A Yost
- Spatial Hearing Laboratory, Department of Speech and Hearing Science, Arizona State University, P.O. Box 870102, Tempe, Arizona 85287-0102, USA.
| | | |
Collapse
|
9
|
Torres AM, Cobos M, Pueo B, Lopez JJ. Robust acoustic source localization based on modal beamforming and time-frequency processing using circular microphone arrays. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1511-1520. [PMID: 22978880 DOI: 10.1121/1.4740503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Uniform circular array processing has been shown to be a very useful tool for broadband acoustic source localization over 360°. Specifically, beamforming methods based on circular harmonics have attracted a lot of research attention in the last several years, as modal array signal processing is a very active research topic. On the other hand, due to the sparsity properties of speech, source localization methods in the time-frequency (T-F) domain have also demonstrated their capability to locate several simultaneous sources with high accuracy. In this paper, a localization framework based on circular harmonics beamforming and T-F processing that provides accurate localization performance under very adverse acoustic conditions is presented. Modal processing and sparsity-based localization are jointly addressed to estimate the direction-of-arrival of multiple concurrent speech sources. Experiments in real and simulated environments with different microphone setups are discussed, showing the validity of the proposed approach and comparing its performance with other state-of-the-art methods.
Collapse
Affiliation(s)
- Ana M Torres
- I.E.E.A.C. Department, Universidad Castilla-La Mancha, 16071, Cuenca, Spain.
| | | | | | | |
Collapse
|
10
|
Woodruff J, Wang D. Binaural Localization of Multiple Sources in Reverberant and Noisy Environments. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tasl.2012.2183869] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Mandel M, Weiss R, Ellis D. Model-Based Expectation-Maximization Source Separation and Localization. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2009.2029711] [Citation(s) in RCA: 211] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
12
|
Abstract
SUMMARYIn nature, sounds from multiple sources, as well as reflections from the surfaces of the physical surroundings, arrive concurrently from different directions at the ears of a listener. Despite the fact that all of these waveforms sum at the eardrums, humans with normal hearing can effortlessly segregate interesting sounds from echoes and other sources of background noises. This paper presents a two-microphone technique for localization of sound sources to effectively guide robotic navigation. Its fundamental structure is adopted from a binaural signal-processing scheme employed in biological systems for the localization of sources using interaural time differences (ITDs). The two input signals are analyzed for coincidences along left/right-channel delay-line pairs. The coincidence time instants are presented as a function of the interaural coherence (IC). Specifically, we build a sphere head model for the selected robot and apply the mechanism of binaural cues selection observed in mammalian hearing system to mitigate the effects of sound echoes. The sound source is found by determining the azimuth at which the maximum of probability density function (PDF) of ITD cues occurs. This eliminates the localization artifacts found during tests. The experimental results of a systematic evaluation demonstrate the superior performance of the proposed method.
Collapse
|
13
|
Jones DL, Ratnam R. Blind location and separation of callers in a natural chorus using a microphone array. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:895-910. [PMID: 19640054 PMCID: PMC2730713 DOI: 10.1121/1.3158924] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Revised: 04/30/2009] [Accepted: 05/27/2009] [Indexed: 05/28/2023]
Abstract
Male frogs and toads call in dense choruses to attract females. Determining the vocal interactions and spatial distribution of the callers is important for understanding acoustic communication in such assemblies. It has so far proved difficult to simultaneously locate and recover the vocalizations of individual callers. Here a microphone-array technique is developed for blindly locating callers using arrival-time delays at the microphones, estimating their steering-vectors, and recovering the calls with a frequency-domain adaptive beamformer. The technique exploits the time-frequency sparseness of the signal space to recover sources even when there are more sources than sensors. The method is tested with data collected from a natural chorus of Gulf Coast toads (Bufo valliceps) and Northern cricket frogs (Acris crepitans). A spatial map of locations accurate to within a few centimeters is constructed, and the individual call waveforms are recovered for nine individual animals within a 9 x 9 m(2). These methods work well in low reverberation when there are no reflectors other than the ground. They will require modifications to incorporate multi-path propagation, particularly for the estimation of time-delays.
Collapse
Affiliation(s)
- Douglas L Jones
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL 61801, USA
| | | |
Collapse
|
14
|
Nagata Y, Iwasaki S, Hariyama T, Fujioka T, Obara T, Wakatake T, Abe M. Binaural Localization Based on Weighted Wiener Gain Improved by Incremental Source Attenuation. ACTA ACUST UNITED AC 2009. [DOI: 10.1109/tasl.2008.2006651] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
15
|
Calmes L, Lakemeyer G, Wagner H. Azimuthal sound localization using coincidence of timing across frequency on a robotic platform. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:2034-48. [PMID: 17471720 DOI: 10.1121/1.2709866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
An algorithm for localizing a sound source with two microphones is introduced and used in real-time situations. This algorithm is inspired by biological computation of interaural time difference as occurring in the barn owl and is a modification of the algorithm proposed by Liu et al. [J. Acoust. Soc. Am. 110, 3218-3231 (2001)] in that it creates a three-dimensional map of coincidence location. This eliminates localization artifacts found during tests with the original algorithm. The source direction is found by determining the azimuth at which the minimum of the response in an azimuth-frequency matrix occurs. The system was tested with a pan-tilt unit in real-time in an office environment with signal types ranging from broadband noise to pure tones. Both open loop (pan-tilt unit stationary) and closed loop experiments (pan-tilt unit moving) were conducted. In real world situations, the algorithm performed well for all signal types except pure tones. Subsequent room simulations showed that localization accuracy decreases with decreasing direct-to-reverberant ratio.
Collapse
Affiliation(s)
- Laurent Calmes
- Knowledge-based Systems Group, Chair of Computer Science V and Institute for Biology II, RWTH-Aachen University, Germany.
| | | | | |
Collapse
|
16
|
Wagner H, Calmes L, Lakemeyer G. A Bio-inspired Sound-localization System for a Mobile Agent. CHEM-ING-TECH 2006. [DOI: 10.1002/cite.200650449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Nix J, Hohmann V. Sound source localization in real sound fields based on empirical statistics of interaural parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:463-79. [PMID: 16454301 DOI: 10.1121/1.2139619] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
The role of temporal fluctuations and systematic variations of interaural parameters in localization of sound sources in spatially distributed, nonstationary noise conditions was investigated. For this, Bayesian estimation was applied to interaural parameters calculated with physiologically plausible time and frequency resolution. Probability density functions (PDFs) of the interaural level differences (ILDs) and phase differences (IPDs) were estimated by measuring histograms for a directional sound source perturbed by several types of interfering noise at signal-to-noise ratios (SNRs) between -5 and +30 dB. A moment analysis of the PDFs reveals that the expected values shift and the standard deviations increase considerably with decreasing SNR, and that the PDFs have non-Gaussian shape at medium SNRs. A d' analysis of the PDFs indicates that elevation discrimination is possible even at low SNRs in the median plane by integrating information across frequency. Absolute sound localization was simulated by a Bayesian maximum a posteriori (MAP) procedure. The simulation is based on frequency integration of broadly tuned "detectors." Confusion patterns of real and estimated sound source directions are similar to those of human listeners. The results indicate that robust processing strategies are needed to exploit interaural parameters successfully in noise conditions due to their strong temporal fluctuations.
Collapse
Affiliation(s)
- Johannes Nix
- Medizinische Physik, Carl von Ossietzky Universität Oldenburg, D-26111 Oldenburg, Germany.
| | | |
Collapse
|
18
|
Lockwood ME, Jones DL, Bilger RC, Lansing CR, O'Brien WD, Wheeler BC, Feng AS. Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2004; 115:379-391. [PMID: 14759029 DOI: 10.1121/1.1624064] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Extraction of a target sound source amidst multiple interfering sound sources is difficult when there are fewer sensors than sources, as is the case for human listeners in the classic cocktail-party situation. This study compares the signal extraction performance of five algorithms using recordings of speech sources made with three different two-microphone arrays in three rooms of varying reverberation time. Test signals, consisting of two to five speech sources, were constructed for each room and array. The signals were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio of the output. A frequency-domain minimum-variance distortionless-response beamformer outperformed the time-domain based Frost beamformer and generalized sidelobe canceler for all tests with two or more interfering sound sources, and performed comparably or better than the time-domain algorithms for tests with one interfering sound source. The frequency-domain minimum-variance algorithm offered performance comparable to that of the Peissig-Kollmeier binaural frequency-domain algorithm, but with much less distortion of the target signal. Comparisons were also made to a simple beamformer. In addition, computer simulations illustrate that, when processing speech signals, the chosen implementation of the frequency-domain minimum-variance technique adapts more quickly and accurately than time-domain techniques.
Collapse
Affiliation(s)
- Michael E Lockwood
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 North Mathews Ave., Urbana, Illinois 61801, USA.
| | | | | | | | | | | | | |
Collapse
|
19
|
Liu C, Wheeler BC, O'Brien WD, Lansing CR, Bilger RC, Jones DL, Feng AS. A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 110:3218-3231. [PMID: 11785823 DOI: 10.1121/1.1419090] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This paper describes algorithms for signal extraction for use as a front-end of telecommunication devices, speech recognition systems, as well as hearing aids that operate in noisy environments. The development was based on some independent, hypothesized theories of the computational mechanics of biological systems in which directional hearing is enabled mainly by binaural processing of interaural directional cues. Our system uses two microphones as input devices and a signal processing method based on the two input channels. The signal processing procedure comprises two major stages: (i) source localization, and (ii) cancellation of noise sources based on knowledge of the locations of all sound sources. The source localization, detailed in our previous paper [Liu et al., J. Acoust. Soc. Am. 108, 1888 (2000)], was based on a well-recognized biological architecture comprising a dual delay-line and a coincidence detection mechanism. This paper focuses on description of the noise cancellation stage. We designed a simple subtraction method which, when strategically employed over the dual delay-line structure in the broadband manner, can effectively cancel multiple interfering sound sources and consequently enhance the desired signal. We obtained an 8-10 dB enhancement for the desired speech in the situations of four talkers in the anechoic acoustic test (or 7-10 dB enhancement in the situations of six talkers in the computer simulation) when all the sounds were equally intense and temporally aligned.
Collapse
Affiliation(s)
- C Liu
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana 61801, USA
| | | | | | | | | | | | | |
Collapse
|