1
|
Bidelman GM. Reply to Manley: Is there more to cochlear tuning than meets the ear? Hear Res 2025; 459:109218. [PMID: 39965528 DOI: 10.1016/j.heares.2025.109218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Accepted: 02/13/2025] [Indexed: 02/20/2025]
Abstract
Enhanced psychophysical and cochlear tuning observed in musicians is unlikely to be explained by mere differences in human cochlear length. A parsimonious account of our 2016 data is improved efferent feedback from the medial olivocochlear efferent system that adjusts masking and tuning properties of the cochlea and is subject to attentional modulation-all functions reported to be enhanced in musically trained ears. Still, new experiments are needed to tease out "nature" vs. "nurture" effects in music-related brain plasticity and move beyond cross-sectional studies and definitions of "musicians" based solely on self-report.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
2
|
Carlyon RP, Deeks JM, Delgutte B, Chung Y, Vollmer M, Ohl FW, Kral A, Tillein J, Litovsky RY, Schnupp J, Rosskothen-Kuhl N, Goldsworthy RL. Limitations on Temporal Processing by Cochlear Implant Users: A Compilation of Viewpoints. Trends Hear 2025; 29:23312165251317006. [PMID: 40095543 PMCID: PMC12076235 DOI: 10.1177/23312165251317006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 12/19/2024] [Accepted: 01/03/2025] [Indexed: 03/19/2025] Open
Abstract
Cochlear implant (CI) users are usually poor at using timing information to detect changes in either pitch or sound location. This deficit occurs even for listeners with good speech perception and even when the speech processor is bypassed to present simple, idealized stimuli to one or more electrodes. The present article presents seven expert opinion pieces on the likely neural bases for these limitations, the extent to which they are modifiable by sensory experience and training, and the most promising ways to overcome them in future. The article combines insights from physiology and psychophysics in cochlear-implanted humans and animals, highlights areas of agreement and controversy, and proposes new experiments that could resolve areas of disagreement.
Collapse
Affiliation(s)
- Robert P. Carlyon
- Cambridge Hearing Group, MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - John M. Deeks
- Cambridge Hearing Group, MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Bertrand Delgutte
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, USA
| | - Yoojin Chung
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, USA
| | - Maike Vollmer
- Department of Experimental Audiology, University Clinic of Otolaryngology, Head and Neck Surgery, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Frank W. Ohl
- Leibniz Institute for Neurobiology (LIN), Magdeburg, Germany
| | - Andrej Kral
- Institute of Audio-Neuro-Technology & Department of Experimental Otology, Clinics of Otolaryngology, Head and Neck Surgery, Hannover Medical School, Hannover, Germany
| | - Jochen Tillein
- Clinics of Otolaryngology, Head and Neck Surgery, J.W.Goethe University, Frankfurt, Germany
- MedEl Company, Hannover, Germany
| | - Ruth Y. Litovsky
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Jan Schnupp
- Gerald Choa Neuroscience Institute and Department of Otolaryngology, Chinese University of Hong Kong, Hong Kong (NB Hong Kong is a Special Administrative Region) of China
| | - Nicole Rosskothen-Kuhl
- Neurobiological Research Laboratory, Section for Experimental and Clinical Otology, Department of Oto-Rhino-Laryngology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Bernstein Center Freiburg & Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Raymond L. Goldsworthy
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
3
|
Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing. Nat Commun 2024; 15:10590. [PMID: 39632854 PMCID: PMC11618365 DOI: 10.1038/s41467-024-54700-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 11/18/2024] [Indexed: 12/07/2024] Open
Abstract
Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models to perform real-world hearing tasks with simulated cochlear input, assessing the precision of auditory nerve spike timing needed to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. However, the temporal precision needed to reproduce human-like behavior varied across tasks, as did the precision that benefited real-world task performance. These effects suggest that perceptual domains incorporate phase locking to different extents depending on the demands of real-world hearing. The results illustrate how optimizing models for realistic tasks can clarify the role of candidate neural codes in perception.
Collapse
Affiliation(s)
- Mark R Saddler
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, MA, USA.
| |
Collapse
|
4
|
Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.21.590435. [PMID: 38712054 PMCID: PMC11071365 DOI: 10.1101/2024.04.21.590435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models to perform real-world hearing tasks with simulated cochlear input, assessing the precision of auditory nerve spike timing needed to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. However, the temporal precision needed to reproduce human-like behavior varied across tasks, as did the precision that benefited real-world task performance. These effects suggest that perceptual domains incorporate phase locking to different extents depending on the demands of real-world hearing. The results illustrate how optimizing models for realistic tasks can clarify the role of candidate neural codes in perception.
Collapse
Affiliation(s)
- Mark R Saddler
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, MA, USA
| |
Collapse
|
5
|
Mackey CA, Hauser S, Schoenhaut AM, Temghare N, Ramachandran R. Hierarchical differences in the encoding of amplitude modulation in the subcortical auditory system of awake nonhuman primates. J Neurophysiol 2024; 132:1098-1114. [PMID: 39140590 PMCID: PMC11427057 DOI: 10.1152/jn.00329.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 07/31/2024] [Accepted: 08/12/2024] [Indexed: 08/15/2024] Open
Abstract
Sinusoidal amplitude modulation (SAM) is a key feature of complex sounds. Although psychophysical studies have characterized SAM perception, and neurophysiological studies in anesthetized animals report a transformation from the cochlear nucleus' (CN; brainstem) temporal code to the inferior colliculus' (IC; midbrain's) rate code, none have used awake animals or nonhuman primates to compare CN and IC's coding strategies to modulation-frequency perception. To address this, we recorded single-unit responses and compared derived neurometric measures in the CN and IC to psychometric measures of modulation frequency (MF) discrimination in macaques. IC and CN neurons often exhibited tuned responses to SAM in rate and spike-timing measures of modulation coding. Neurometric thresholds spanned a large range (2-200 Hz ΔMF). The lowest 40% of IC thresholds were less than or equal to psychometric thresholds, regardless of which code was used, whereas CN thresholds were greater than psychometric thresholds. Discrimination at 10-20 Hz could be explained by indiscriminately pooling 30 units in either structure, whereas discrimination at higher MFs was best explained by more selective pooling. This suggests that pooled CN activity was sufficient for AM discrimination. Psychometric and neurometric thresholds decreased as stimulus duration increased, but IC and CN thresholds were higher and more variable than behavior at short durations. This slower subcortical temporal integration compared with behavior was consistent with a drift diffusion model that reproduced individual differences in performance and can constrain future neurophysiological studies of temporal integration. These measures provide an account of AM perception at the neurophysiological, computational, and behavioral levels.NEW & NOTEWORTHY In everyday environments, the brain is tasked with extracting information from sound envelopes, which involves both sensory encoding and perceptual decision-making. Different neural codes for envelope representation have been characterized in midbrain and cortex, but studies of brainstem nuclei such as the cochlear nucleus (CN) have usually been conducted under anesthesia in nonprimate species. Here, we found that subcortical activity in awake monkeys and a biologically plausible perceptual decision-making model accounted for sound envelope discrimination behavior.
Collapse
Affiliation(s)
- Chase A Mackey
- Neuroscience Graduate Program, Vanderbilt University, Nashville, Tennessee, United States
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Samantha Hauser
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Adriana M Schoenhaut
- Neuroscience Graduate Program, Vanderbilt University, Nashville, Tennessee, United States
| | - Namrata Temghare
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Ramnarayan Ramachandran
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| |
Collapse
|
6
|
Joris PX, Verschooten E, Mc Laughlin M, Versteegh C, van der Heijden M. Frequency selectivity in monkey auditory nerve studied with suprathreshold multicomponent stimuli. Hear Res 2024; 443:108964. [PMID: 38277882 DOI: 10.1016/j.heares.2024.108964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/15/2024] [Accepted: 01/20/2024] [Indexed: 01/28/2024]
Abstract
Data from non-human primates can help extend observations from non-primate species to humans. Here we report measurements on the auditory nerve of macaque monkeys in the context of a controversial topic important to human hearing. A range of techniques have been used to examine the claim, which is not generally accepted, that human frequency tuning is sharper than traditionally thought, and sharper than in commonly used animal models. Data from single auditory-nerve fibers occupy a pivotal position to examine this claim, but are not available for humans. A previous study reported sharper tuning in auditory-nerve fibers of macaque relative to the cat. A limitation of these and other single-fiber data is that frequency selectivity was measured with tonal threshold-tuning curves, which do not directly assess spectral filtering and whose shape is sharpened by cochlear nonlinearity. Our aim was to measure spectral filtering with wideband suprathreshold stimuli in the macaque auditory nerve. We obtained responses of single nerve fibers of anesthetized macaque monkeys and cats to a suprathreshold, wideband, multicomponent stimulus designed to allow characterization of spectral filtering at any cochlear locus. Quantitatively the differences between the two species are smaller than in previous studies, but consistent with these studies the filters obtained show a trend of sharper tuning in macaque, relative to the cat, for fibers in the basal half of the cochlea. We also examined differences in group delay measured on the phase data near the characteristic frequency versus in the low-frequency tail. The phase data are consistent with the interpretation of sharper frequency tuning in monkey in the basal half of the cochlea. We conclude that use of suprathreshold, wide-band stimuli supports the interpretation of sharper frequency selectivity in macaque nerve fibers relative to the cat, although the difference is less marked than apparent from the assessment with tonal threshold-based data.
Collapse
Affiliation(s)
- P X Joris
- Lab of Auditory Neurophysiology, KU Leuven, O&N2 KU Leuven, Herestraat 49 bus 1021, Leuven B-3000, Belgium.
| | - E Verschooten
- Lab of Auditory Neurophysiology, KU Leuven, O&N2 KU Leuven, Herestraat 49 bus 1021, Leuven B-3000, Belgium
| | - M Mc Laughlin
- Lab of Auditory Neurophysiology, KU Leuven, O&N2 KU Leuven, Herestraat 49 bus 1021, Leuven B-3000, Belgium
| | - Cpc Versteegh
- Department of Neuroscience, Erasmus MC, Rotterdam, the Netherlands
| | | |
Collapse
|
7
|
Deloche F, Parida S, Sivaprakasam A, Heinz MG. Estimation of Cochlear Frequency Selectivity Using a Convolution Model of Forward-Masked Compound Action Potentials. J Assoc Res Otolaryngol 2024; 25:35-51. [PMID: 38278969 PMCID: PMC10907335 DOI: 10.1007/s10162-023-00922-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 12/09/2023] [Indexed: 01/28/2024] Open
Abstract
PURPOSE Frequency selectivity is a fundamental property of the peripheral auditory system; however, the invasiveness of auditory nerve (AN) experiments limits its study in the human ear. Compound action potentials (CAPs) associated with forward masking have been suggested as an alternative to assess cochlear frequency selectivity. Previous methods relied on an empirical comparison of AN and CAP tuning curves in animal models, arguably not taking full advantage of the information contained in forward-masked CAP waveforms. METHODS To improve the estimation of cochlear frequency selectivity based on the CAP, we introduce a convolution model to fit forward-masked CAP waveforms. The model generates masking patterns that, when convolved with a unitary response, can predict the masking of the CAP waveform induced by Gaussian noise maskers. Model parameters, including those characterizing frequency selectivity, are fine-tuned by minimizing waveform prediction errors across numerous masking conditions, yielding robust estimates. RESULTS The method was applied to click-evoked CAPs at the round window of anesthetized chinchillas using notched-noise maskers with various notch widths and attenuations. The estimated quality factor Q10 as a function of center frequency is shown to closely match the average quality factor obtained from AN fiber tuning curves, without the need for an empirical correction factor. CONCLUSION This study establishes a moderately invasive method for estimating cochlear frequency selectivity with potential applicability to other animal species or humans. Beyond the estimation of frequency selectivity, the proposed model proved to be remarkably accurate in fitting forward-masked CAP responses and could be extended to study more complex aspects of cochlear signal processing (e.g., compressive nonlinearities).
Collapse
Affiliation(s)
- François Deloche
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA.
| | - Satyabrata Parida
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| | - Andrew Sivaprakasam
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| |
Collapse
|
8
|
Liu J, Stohl J, Lopez-Poveda EA, Overath T. Quantifying the Impact of Auditory Deafferentation on Speech Perception. Trends Hear 2024; 28:23312165241227818. [PMID: 38291713 PMCID: PMC10832414 DOI: 10.1177/23312165241227818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 12/22/2023] [Accepted: 01/05/2024] [Indexed: 02/01/2024] Open
Abstract
The past decade has seen a wealth of research dedicated to determining which and how morphological changes in the auditory periphery contribute to people experiencing hearing difficulties in noise despite having clinically normal audiometric thresholds in quiet. Evidence from animal studies suggests that cochlear synaptopathy in the inner ear might lead to auditory nerve deafferentation, resulting in impoverished signal transmission to the brain. Here, we quantify the likely perceptual consequences of auditory deafferentation in humans via a physiologically inspired encoding-decoding model. The encoding stage simulates the processing of an acoustic input stimulus (e.g., speech) at the auditory periphery, while the decoding stage is trained to optimally regenerate the input stimulus from the simulated auditory nerve firing data. This allowed us to quantify the effect of different degrees of auditory deafferentation by measuring the extent to which the decoded signal supported the identification of speech in quiet and in noise. In a series of experiments, speech perception thresholds in quiet and in noise increased (worsened) significantly as a function of the degree of auditory deafferentation for modeled deafferentation greater than 90%. Importantly, this effect was significantly stronger in a noisy than in a quiet background. The encoding-decoding model thus captured the hallmark symptom of degraded speech perception in noise together with normal speech perception in quiet. As such, the model might function as a quantitative guide to evaluating the degree of auditory deafferentation in human listeners.
Collapse
Affiliation(s)
- Jiayue Liu
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Joshua Stohl
- North American Research Laboratory, MED-EL Corporation, Durham, NC, USA
| | - Enrique A. Lopez-Poveda
- Instituto de Neurociencias de Castilla y Leon, University of Salamanca, Salamanca, Spain
- Departamento de Cirugía, Facultad de Medicina, University of Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca, Universidad de Salamanca, Salamanca, Spain
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| |
Collapse
|
9
|
Li YH, Joris PX. Case reopened: A temporal basis for harmonic pitch templates in the early auditory system?a). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3986-4003. [PMID: 38149819 DOI: 10.1121/10.0023969] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 12/04/2023] [Indexed: 12/28/2023]
Abstract
A fundamental assumption of rate-place models of pitch is the existence of harmonic templates in the central nervous system (CNS). Shamma and Klein [(2000). J. Acoust. Soc. Am. 107, 2631-2644] hypothesized that these templates have a temporal basis. Coincidences in the temporal fine-structure of neural spike trains, even in response to nonharmonic, stochastic stimuli, would be sufficient for the development of harmonic templates. The physiological plausibility of this hypothesis is tested. Responses to pure tones, low-pass noise, and broadband noise from auditory nerve fibers and brainstem "high-sync" neurons are studied. Responses to tones simulate the output of fibers with infinitely sharp filters: for these responses, harmonic structure in a coincidence matrix comparing pairs of spike trains is indeed found. However, harmonic template structure is not observed in coincidences across responses to broadband noise, which are obtained from nerve fibers or neurons with enhanced synchronization. Using a computer model based on that of Shamma and Klein, it is shown that harmonic templates only emerge when consecutive processing steps (cochlear filtering, lateral inhibition, and temporal enhancement) are implemented in extreme, physiologically implausible form. It is concluded that current physiological knowledge does not support the hypothesis of Shamma and Klein (2000).
Collapse
Affiliation(s)
- Yi-Hsuan Li
- Laboratory of Auditory Neurophysiology, Medical School, Campus Gasthuisberg, University of Leuven, B-3000 Leuven, Belgium
| | - Philip X Joris
- Laboratory of Auditory Neurophysiology, Medical School, Campus Gasthuisberg, University of Leuven, B-3000 Leuven, Belgium
| |
Collapse
|
10
|
Wei L, Verschooten E, Joris PX. Enhancement of phase-locking in rodents. II. An axonal recording study in chinchilla. J Neurophysiol 2023; 130:751-767. [PMID: 37609701 DOI: 10.1152/jn.00474.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 08/07/2023] [Accepted: 08/15/2023] [Indexed: 08/24/2023] Open
Abstract
The trapezoid body (TB) contains axons of neurons residing in the anteroventral cochlear nucleus (AVCN) that provide excitatory and inhibitory inputs to the main monaural and binaural nuclei in the superior olivary complex (SOC). To understand the monaural and binaural response properties of neurons in the medial and lateral superior olive (MSO and LSO), it is important to characterize the temporal firing properties of these inputs. Because of its exceptional low-frequency hearing, the chinchilla (Chinchilla lanigera) is one of the widely used small animal models for studies of hearing. However, the characterization of the output of its ventral cochlear nucleus to the nuclei of the SOC is fragmentary. We obtained responses of TB axons to stimuli typically used in binaural studies and compared these responses to those of auditory nerve (AN) fibers, with a focus on temporal coding. We found enhancement of phase-locking and entrainment, i.e., the ability of a neuron to fire action potentials at a certain stimulus phase for nearly every stimulus period, in TB axons relative to AN fibers. Enhancement in phase-locking and entrainment are quantitatively more modest than in the cat but greater than in the gerbil. As in these species, these phenomena occur not only in low-frequency neurons stimulated at their characteristic frequency but also in neurons tuned to higher frequencies when stimulated with low-frequency tones, to which complex phase-locking behavior with multiple modes of firing per stimulus cycle is frequently observed.NEW & NOTEWORTHY The sensitivity of neurons to small time differences in sustained sounds to both ears is important for binaural hearing, and this sensitivity is critically dependent on phase-locking in the monaural pathways. Although studies in cat showed a marked improvement in phase-locking from the peripheral to the central auditory nervous system, the evidence in rodents is mixed. Here, we recorded from AN and TB of chinchilla and found temporal enhancement, though more limited than in cat.
Collapse
Affiliation(s)
- Liting Wei
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| | - Eric Verschooten
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| | - Philip X Joris
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| |
Collapse
|
11
|
Whiteford KL, Oxenham AJ. Sensitivity to Frequency Modulation is Limited Centrally. J Neurosci 2023; 43:3687-3695. [PMID: 37028932 PMCID: PMC10198444 DOI: 10.1523/jneurosci.0995-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 03/23/2023] [Accepted: 03/31/2023] [Indexed: 04/09/2023] Open
Abstract
Modulations in both amplitude and frequency are prevalent in natural sounds and are critical in defining their properties. Humans are exquisitely sensitive to frequency modulation (FM) at the slow modulation rates and low carrier frequencies that are common in speech and music. This enhanced sensitivity to slow-rate and low-frequency FM has been widely believed to reflect precise, stimulus-driven phase locking to temporal fine structure in the auditory nerve. At faster modulation rates and/or higher carrier frequencies, FM is instead thought to be coded by coarser frequency-to-place mapping, where FM is converted to amplitude modulation (AM) via cochlear filtering. Here, we show that patterns of human FM perception that have classically been explained by limits in peripheral temporal coding are instead better accounted for by constraints in the central processing of fundamental frequency (F0) or pitch. We measured FM detection in male and female humans using harmonic complex tones with an F0 within the range of musical pitch but with resolved harmonic components that were all above the putative limits of temporal phase locking (>8 kHz). Listeners were more sensitive to slow than fast FM rates, even though all components were beyond the limits of phase locking. In contrast, AM sensitivity remained better at faster than slower rates, regardless of carrier frequency. These findings demonstrate that classic trends in human FM sensitivity, previously attributed to auditory nerve phase locking, may instead reflect the constraints of a unitary code that operates at a more central level of processing.SIGNIFICANCE STATEMENT Natural sounds involve dynamic frequency and amplitude fluctuations. Humans are particularly sensitive to frequency modulation (FM) at slow rates and low carrier frequencies, which are prevalent in speech and music. This sensitivity has been ascribed to encoding of stimulus temporal fine structure (TFS) via phase-locked auditory nerve activity. To test this long-standing theory, we measured FM sensitivity using complex tones with a low F0 but only high-frequency harmonics beyond the limits of phase locking. Dissociating the F0 from TFS showed that FM sensitivity is limited not by peripheral encoding of TFS but rather by central processing of F0, or pitch. The results suggest a unitary code for FM detection limited by more central constraints.
Collapse
Affiliation(s)
- Kelly L Whiteford
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
12
|
Oxenham AJ. Questions and controversies surrounding the perception and neural coding of pitch. Front Neurosci 2023; 16:1074752. [PMID: 36699531 PMCID: PMC9868815 DOI: 10.3389/fnins.2022.1074752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 12/16/2022] [Indexed: 01/12/2023] Open
Abstract
Pitch is a fundamental aspect of auditory perception that plays an important role in our ability to understand speech, appreciate music, and attend to one sound while ignoring others. The questions surrounding how pitch is represented in the auditory system, and how our percept relates to the underlying acoustic waveform, have been a topic of inquiry and debate for well over a century. New findings and technological innovations have led to challenges of some long-standing assumptions and have raised new questions. This article reviews some recent developments in the study of pitch coding and perception and focuses on the topic of how pitch information is extracted from peripheral representations based on frequency-to-place mapping (tonotopy), stimulus-driven auditory-nerve spike timing (phase locking), or a combination of both. Although a definitive resolution has proved elusive, the answers to these questions have potentially important implications for mitigating the effects of hearing loss via devices such as cochlear implants.
Collapse
Affiliation(s)
- Andrew J. Oxenham
- Center for Applied and Translational Sensory Science, University of Minnesota Twin Cities, Minneapolis, MN, United States
- Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, United States
| |
Collapse
|
13
|
Shofner WP. Cochlear tuning and the peripheral representation of harmonic sounds in mammals. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:145-161. [PMID: 35867137 DOI: 10.1007/s00359-022-01560-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 06/24/2022] [Accepted: 07/01/2022] [Indexed: 02/07/2023]
Abstract
Albert Feng was a prominent comparative neurophysiologist whose research provided numerous contributions towards understanding how the spectral and temporal characteristics of vocalizations underlie sound communication in frogs and bats. The present study is dedicated to Al's memory and compares the spectral and temporal representations of stochastic, complex sounds which underlie the perception of pitch strength in humans and chinchillas. Specifically, the pitch strengths of these stochastic sounds differ between humans and chinchillas, suggesting that humans and chinchillas may be using different cues. Outputs of auditory filterbank models based on human and chinchilla cochlear tuning were examined. Excitation patterns of harmonics are enhanced in humans as compared with chinchillas. In contrast, summary correlograms are degraded in humans as compared with chinchillas. Comparing summary correlograms and excitation patterns with corresponding behavioral data on pitch strength suggests that the dominant cue for pitch strength in humans is spectral (i.e., harmonic) structure, whereas the dominant cue for chinchillas is temporal (i.e., envelope) structure. The results support arguments that the broader cochlear tuning in non-human mammals emphasizes temporal cues for pitch perception, whereas the sharper cochlear tuning in humans emphasizes spectral cues.
Collapse
Affiliation(s)
- William P Shofner
- Department of Speech, Language and Hearing Sciences, Indiana University, 2631 East Discovery Parkway, Bloomington, IN, 47408, USA.
| |
Collapse
|
14
|
Suresh CH, Krishnan A. Frequency-Following Response to Steady-State Vowel in Quiet and Background Noise Among Marching Band Participants With Normal Hearing. Am J Audiol 2022; 31:719-736. [PMID: 35944059 DOI: 10.1044/2022_aja-21-00226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVE Human studies enrolling individuals at high risk for cochlear synaptopathy (CS) have reported difficulties in speech perception in adverse listening conditions. The aim of this study is to determine if these individuals show a degradation in the neural encoding of speech in quiet and in the presence of background noise as reflected in neural phase-locking to both envelope periodicity and temporal fine structure (TFS). To our knowledge, there are no published reports that have specifically examined the neural encoding of both envelope periodicity and TFS of speech stimuli (in quiet and in adverse listening conditions) among a sample with loud-sound exposure history who are at risk for CS. METHOD Using scalp-recorded frequency-following response (FFR), the authors evaluated the neural encoding of envelope periodicity (FFRENV) and TFS (FFRTFS) for a steady-state vowel (English back vowel /u/) in quiet and in the presence of speech-shaped noise presented at +5- and 0 dB SNR. Participants were young individuals with normal hearing who participated in the marching band for at least 5 years (high-risk group) and non-marching band group with low-noise exposure history (low-risk group). RESULTS The results showed no group differences in the neural encoding of either the FFRENV or the first formant (F1) in the FFRTFS in quiet and in noise. Paradoxically, the high-risk group demonstrated enhanced representation of F2 harmonics across all stimulus conditions. CONCLUSIONS These results appear to be in line with a music experience-dependent enhancement of F2 harmonics. However, due to sound overexposure in the high-risk group, the role of homeostatic central compensation cannot be ruled out. A larger scale data set with different noise exposure background, longitudinal measurements with an array of behavioral and electrophysiological tests is needed to disentangle the nature of the complex interaction between the effects of central compensatory gain and experience-dependent enhancement.
Collapse
Affiliation(s)
- Chandan H Suresh
- Department of Communication Disorders, California State University, Los Angeles
| | | |
Collapse
|
15
|
Mehta AH, Oxenham AJ. Role of perceptual integration in pitch discrimination at high frequenciesa). JASA EXPRESS LETTERS 2022; 2:084402. [PMID: 37311192 PMCID: PMC10264831 DOI: 10.1121/10.0013429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
At very high frequencies, fundamental-frequency difference limens (F0DLs) for five-component harmonic complex tones can be better than predicted by optimal integration of information, assuming performance is limited by noise at the peripheral level, but are in line with predictions based on more central sources of noise. This study investigates whether there is a minimum number of harmonic components needed for such super-optimal integration effects and if harmonic range or inharmonicity affects this super-optimal integration. Results show super-optimal integration, even with two harmonic components and for most combinations of consecutive harmonic, but not inharmonic, components.
Collapse
Affiliation(s)
- Anahita H Mehta
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA ,
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA ,
| |
Collapse
|
16
|
Joris PX. In praise of adventitious sounds. Hear Res 2022; 425:108592. [DOI: 10.1016/j.heares.2022.108592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 07/13/2022] [Accepted: 07/26/2022] [Indexed: 11/04/2022]
|
17
|
Guest DR, Oxenham AJ. Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch. PLoS Comput Biol 2022; 18:e1009889. [PMID: 35239639 PMCID: PMC8923464 DOI: 10.1371/journal.pcbi.1009889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 03/15/2022] [Accepted: 02/02/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate pitch perception of harmonic complex tones is widely believed to rely on temporal fine structure information conveyed by the precise phase-locked responses of auditory-nerve fibers. However, accurate pitch perception remains possible even when spectrally resolved harmonics are presented at frequencies beyond the putative limits of neural phase locking, and it is unclear whether residual temporal information, or a coarser rate-place code, underlies this ability. We addressed this question by measuring human pitch discrimination at low and high frequencies for harmonic complex tones, presented either in isolation or in the presence of concurrent complex-tone maskers. We found that concurrent complex-tone maskers impaired performance at both low and high frequencies, although the impairment introduced by adding maskers at high frequencies relative to low frequencies differed between the tested masker types. We then combined simulated auditory-nerve responses to our stimuli with ideal-observer analysis to quantify the extent to which performance was limited by peripheral factors. We found that the worsening of both frequency discrimination and F0 discrimination at high frequencies could be well accounted for (in relative terms) by optimal decoding of all available information at the level of the auditory nerve. A Python package is provided to reproduce these results, and to simulate responses to acoustic stimuli from the three previously published models of the human auditory nerve used in our analyses.
Collapse
Affiliation(s)
- Daniel R. Guest
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Andrew J. Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
18
|
Individualized Assays of Temporal Coding in the Ascending Human Auditory System. eNeuro 2022; 9:ENEURO.0378-21.2022. [PMID: 35193890 PMCID: PMC8925652 DOI: 10.1523/eneuro.0378-21.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 01/12/2022] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open
Abstract
Neural phase-locking to temporal fluctuations is a fundamental and unique mechanism by which acoustic information is encoded by the auditory system. The perceptual role of this metabolically expensive mechanism, the neural phase-locking to temporal fine structure (TFS) in particular, is debated. Although hypothesized, it is unclear whether auditory perceptual deficits in certain clinical populations are attributable to deficits in TFS coding. Efforts to uncover the role of TFS have been impeded by the fact that there are no established assays for quantifying the fidelity of TFS coding at the individual level. While many candidates have been proposed, for an assay to be useful, it should not only intrinsically depend on TFS coding, but should also have the property that individual differences in the assay reflect TFS coding per se over and beyond other sources of variance. Here, we evaluate a range of behavioral and electroencephalogram (EEG)-based measures as candidate individualized measures of TFS sensitivity. Our comparisons of behavioral and EEG-based metrics suggest that extraneous variables dominate both behavioral scores and EEG amplitude metrics, rendering them ineffective. After adjusting behavioral scores using lapse rates, and extracting latency or percent-growth metrics from EEG, interaural timing sensitivity measures exhibit robust behavior-EEG correlations. Together with the fact that unambiguous theoretical links can be made relating binaural measures and phase-locking to TFS, our results suggest that these "adjusted" binaural assays may be well suited for quantifying individual TFS processing.
Collapse
|
19
|
Heil P, Mohamed ESI, Matysiak A. Towards a unifying basis of auditory thresholds: Thresholds for multicomponent stimuli. Hear Res 2021; 410:108349. [PMID: 34530356 DOI: 10.1016/j.heares.2021.108349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 08/23/2021] [Accepted: 08/30/2021] [Indexed: 11/25/2022]
Abstract
Sounds consisting of multiple simultaneous or consecutive components can be detected by listeners when the stimulus levels of the components are lower than those needed to detect the individual components alone. The mechanisms underlying such spectral, spectrotemporal, temporal, or across-ear integration are not completely understood. Here, we report threshold measurements from human subjects for multicomponent stimuli (tone complexes, tone sequences, diotic or dichotic tones) and for their individual sinusoidal components in quiet. We examine whether the data are compatible with the detection model developed by Heil, Matysiak, and Neubauer (HMN model) to account for temporal integration (Heil et al. 2017), and we compare its performance to that of the statistical summation model (Green 1958), the model commonly used to account for spectral and spectrotemporal integration. In addition, we compare the performance of both models with respect to previously published thresholds for sequences of identical tones and for diotic tones. The HMN model is similar to the statistical summation model but is based on the assumption that the decision variable is a number of sensory events generated by the components via independent Poisson point processes. The rate of events is low without stimulation and increases with stimulation. The increase is proportional to the time-varying amplitude envelope of the bandpass-filtered component(s) raised to an exponent of 3. For an ideal observer, the decision variable is the sum of the events from all channels carrying information, for as long as they carry information. We find that the HMN model provides a better account of the thresholds for multicomponent stimuli than the statistical summation model, and it offers a unifying account of spectral, spectrotemporal, temporal, and across-ear integration at threshold.
Collapse
Affiliation(s)
- Peter Heil
- Department of Systems Physiology of Learning, Leibniz Institute for Neurobiology, Magdeburg 39118, Germany; Center for Behavioral Brain Sciences, Magdeburg, Germany.
| | - Esraa S I Mohamed
- Department of Systems Physiology of Learning, Leibniz Institute for Neurobiology, Magdeburg 39118, Germany
| | - Artur Matysiak
- Research Group Comparative Neuroscience, Leibniz Institute for Neurobiology, Magdeburg, Germany
| |
Collapse
|
20
|
Temporal Correlates to Monaural Edge Pitch in the Distribution of Interspike Interval Statistics in the Auditory Nerve. eNeuro 2021; 8:ENEURO.0292-21.2021. [PMID: 34281977 PMCID: PMC8387151 DOI: 10.1523/eneuro.0292-21.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 07/07/2021] [Indexed: 12/02/2022] Open
Abstract
Pitch is a perceptual attribute enabling perception of melody. There is no consensus regarding the fundamental nature of pitch and its underlying neural code. A stimulus which has received much interest in psychophysical and computational studies is noise with a sharp spectral edge. High-pass (HP) or low-pass (LP) noise gives rise to a pitch near the edge frequency (monaural edge pitch; MEP). The simplicity of this stimulus, combined with its spectral and autocorrelation properties, make it an interesting stimulus to examine spectral versus temporal cues that could underly its pitch. We recorded responses of single auditory nerve (AN) fibers in chinchilla to MEP-stimuli varying in edge frequency. Temporal cues were examined with shuffled autocorrelogram (SAC) analysis. Correspondence between the population’s dominant interspike interval and reported pitch estimates was poor. A fuller analysis of the population interspike interval distribution, which incorporates not only the dominant but all intervals, results in good matches with behavioral results, but not for the entire range of edge frequencies that generates pitch. Finally, we also examined temporal structure over a slower time scale, intermediate between average firing rate and interspike intervals, by studying the SAC envelope. We found that, in response to a given MEP stimulus, this feature also systematically varies with edge frequency, across fibers with different characteristic frequency (CF). Because neural mechanisms to extract envelope cues are well established, and because this cue is not limited by coding of stimulus fine-structure, this newly identified slower temporal cue is a more plausible basis for pitch than cues based on fine-structure.
Collapse
|
21
|
Palandrani KN, Hoover EC, Stavropoulos T, Seitz AR, Isarangura S, Gallun FJ, Eddins DA. Temporal integration of monaural and dichotic frequency modulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:745. [PMID: 34470296 PMCID: PMC8337085 DOI: 10.1121/10.0005729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 06/17/2021] [Accepted: 07/02/2021] [Indexed: 05/06/2023]
Abstract
Frequency modulation (FM) detection at low modulation frequencies is commonly used as an index of temporal fine-structure processing. The present study evaluated the rate of improvement in monaural and dichotic FM across a range of test parameters. In experiment I, dichotic and monaural FM detection was measured as a function of duration and modulator starting phase. Dichotic FM thresholds were lower than monaural FM thresholds and the modulator starting phase had no effect on detection. Experiment II measured monaural FM detection for signals that differed in modulation rate and duration such that the improvement with duration in seconds (carrier) or cycles (modulator) was compared. Monaural FM detection improved monotonically with the number of modulation cycles, suggesting that the modulator is extracted prior to detection. Experiment III measured dichotic FM detection for shorter signal durations to test the hypothesis that dichotic FM relies primarily on the signal onset. The rate of improvement decreased as duration increased, which is consistent with the use of primarily onset cues for the detection of dichotic FM. These results establish that improvement with duration occurs as a function of the modulation cycles at a rate consistent with the independent-samples model for monaural FM, but later cycles contribute less to detection in dichotic FM.
Collapse
Affiliation(s)
- Katherine N Palandrani
- Department of Communication Sciences and Disorders, University of Maryland, College Park, Maryland 20742, USA
| | - Eric C Hoover
- Department of Communication Sciences and Disorders, University of Maryland, College Park, Maryland 20742, USA
| | - Trevor Stavropoulos
- Brain Game Center, University of California Riverside, Riverside, California 92521, USA
| | - Aaron R Seitz
- Department of Psychology, University of California Riverside, Riverside, California 92521, USA
| | - Sittiprapa Isarangura
- Department of Communication Sciences and Disorders, Mahidol University, Phaya Thai, Bangkok 10400, Thailand
| | - Frederick J Gallun
- Oregon Hearing Research Center, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| |
Collapse
|
22
|
Gockel HE, Carlyon RP. On musical interval perception for complex tones at very high frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2644. [PMID: 33940917 PMCID: PMC7612123 DOI: 10.1121/10.0004222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 03/17/2021] [Indexed: 06/12/2023]
Abstract
Listeners appear able to extract a residue pitch from high-frequency harmonics for which phase locking to the temporal fine structure is weak or absent. The present study investigated musical interval perception for high-frequency harmonic complex tones using the same stimuli as Lau, Mehta, and Oxenham [J. Neurosci. 37, 9013-9021 (2017)]. Nine young musically trained listeners with especially good high-frequency hearing adjusted various musical intervals using harmonic complex tones containing harmonics 6-10. The reference notes had fundamental frequencies (F0s) of 280 or 1400 Hz. Interval matches were possible, albeit markedly worse, even when all harmonic frequencies were above the presumed limit of phase locking. Matches showed significantly larger systematic errors and higher variability, and subjects required more trials to finish a match for the high than for the low F0. Additional absolute pitch judgments from one subject with absolute pitch, for complex tones containing harmonics 1-5 or 6-10 with a wide range of F0s, were perfect when the lowest frequency component was below about 7 kHz, but at least 50% of responses were incorrect when it was 8 kHz or higher. The results are discussed in terms of the possible effects of phase-locking information and familiarity with high-frequency stimuli on pitch.
Collapse
|
23
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
24
|
Whiteford KL, Kreft HA, Oxenham AJ. The role of cochlear place coding in the perception of frequency modulation. eLife 2020; 9:58468. [PMID: 32996463 PMCID: PMC7556860 DOI: 10.7554/elife.58468] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/29/2020] [Indexed: 12/17/2022] Open
Abstract
Natural sounds convey information via frequency and amplitude modulations (FM and AM). Humans are acutely sensitive to the slow rates of FM that are crucial for speech and music. This sensitivity has long been thought to rely on precise stimulus-driven auditory-nerve spike timing (time code), whereas a coarser code, based on variations in the cochlear place of stimulation (place code), represents faster FM rates. We tested this theory in listeners with normal and impaired hearing, spanning a wide range of place-coding fidelity. Contrary to predictions, sensitivity to both slow and fast FM correlated with place-coding fidelity. We also used incoherent AM on two carriers to simulate place coding of FM and observed poorer sensitivity at high carrier frequencies and fast rates, two properties of FM detection previously ascribed to the limits of time coding. The results suggest a unitary place-based neural code for FM across all rates and carrier frequencies.
Collapse
Affiliation(s)
- Kelly L Whiteford
- Department of Psychology, University of Minnesota, Minneapolis, United States
| | - Heather A Kreft
- Department of Psychology, University of Minnesota, Minneapolis, United States
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, United States
| |
Collapse
|
25
|
Mehta AH, Oxenham AJ. Effect of lowest harmonic rank on fundamental-frequency difference limens varies with fundamental frequency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2314. [PMID: 32359332 PMCID: PMC7166120 DOI: 10.1121/10.0001092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 03/25/2020] [Accepted: 03/27/2020] [Indexed: 06/11/2023]
Abstract
This study investigated the relationship between fundamental frequency difference limens (F0DLs) and the lowest harmonic number present over a wide range of F0s (30-2000 Hz) for 12-component harmonic complex tones that were presented in either sine or random phase. For fundamental frequencies (F0s) between 100 and 400 Hz, a transition from low (∼1%) to high (∼5%) F0DLs occurred as the lowest harmonic number increased from about seven to ten, in line with earlier studies. At lower and higher F0s, the transition between low and high F0DLs occurred at lower harmonic numbers. The worsening performance at low F0s was reasonably well predicted by the expected decrease in spectral resolution below about 500 Hz. At higher F0s, the degradation in performance at lower harmonic numbers could not be predicted by changes in spectral resolution but remained relatively good (<2%-3%) in some conditions, even when all harmonics were above 8 kHz, confirming that F0 can be extracted from harmonics even when temporal envelope or fine-structure cues are weak or absent.
Collapse
Affiliation(s)
- Anahita H Mehta
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
26
|
Deloche F. Fine-grained statistical structure of speech. PLoS One 2020; 15:e0230233. [PMID: 32196513 PMCID: PMC7083313 DOI: 10.1371/journal.pone.0230233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 02/25/2020] [Indexed: 12/04/2022] Open
Abstract
In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1–8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.
Collapse
Affiliation(s)
- François Deloche
- Centre d’analyse et de mathématique sociales, CNRS, EHESS, Paris, France
- * E-mail:
| |
Collapse
|
27
|
Electrocochleography During Translabyrinthine Approach for Vestibular Schwannoma Removal. Otol Neurotol 2020; 41:e369-e377. [DOI: 10.1097/mao.0000000000002543] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
28
|
Robust Rate-Place Coding of Resolved Components in Harmonic and Inharmonic Complex Tones in Auditory Midbrain. J Neurosci 2020; 40:2080-2093. [PMID: 31996454 DOI: 10.1523/jneurosci.2337-19.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 01/12/2020] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Harmonic complex tones (HCTs) commonly occurring in speech and music evoke a strong pitch at their fundamental frequency (F0), especially when they contain harmonics individually resolved by the cochlea. When all frequency components of an HCT are shifted by the same amount, the pitch of the resulting inharmonic tone (IHCT) can also shift, although the envelope repetition rate is unchanged. A rate-place code, whereby resolved harmonics are represented by local maxima in firing rates along the tonotopic axis, has been characterized in the auditory nerve and primary auditory cortex, but little is known about intermediate processing stages. We recorded single-neuron responses to HCT and IHCT with varying F0 and sound level in the inferior colliculus (IC) of unanesthetized rabbits of both sexes. Many neurons showed peaks in firing rate when a low-numbered harmonic aligned with the neuron's characteristic frequency, demonstrating "rate-place" coding. The IC rate-place code was most prevalent for F0 > 800 Hz, was only moderately dependent on sound level over a 40 dB range, and was not sensitive to stimulus harmonicity. A spectral receptive-field model incorporating broadband inhibition better predicted the neural responses than a purely excitatory model, suggesting an enhancement of the rate-place representation by inhibition. Some IC neurons showed facilitation in response to HCT relative to pure tones, similar to cortical "harmonic template neurons" (Feng and Wang, 2017), but to a lesser degree. Our findings shed light on the transformation of rate-place coding of resolved harmonics along the auditory pathway.SIGNIFICANCE STATEMENT Harmonic complex tones are ubiquitous in speech and music and produce strong pitch percepts when they contain frequency components that are individually resolved by the cochlea. Here, we characterize a "rate-place" code for resolved harmonics in the auditory midbrain that is more robust across sound levels than the peripheral rate-place code and insensitive to the harmonic relationships among frequency components. We use a computational model to show that inhibition may play an important role in shaping the rate-place code. Our study fills a major gap in understanding the transformations in neural representations of resolved harmonics along the auditory pathway.
Collapse
|
29
|
Burton JA, Valero MD, Hackett TA, Ramachandran R. The use of nonhuman primates in studies of noise injury and treatment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3770. [PMID: 31795680 PMCID: PMC6881191 DOI: 10.1121/1.5132709] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 07/25/2019] [Accepted: 07/30/2019] [Indexed: 05/10/2023]
Abstract
Exposure to prolonged or high intensity noise increases the risk for permanent hearing impairment. Over several decades, researchers characterized the nature of harmful noise exposures and worked to establish guidelines for effective protection. Recent laboratory studies, primarily conducted in rodent models, indicate that the auditory system may be more vulnerable to noise-induced hearing loss (NIHL) than previously thought, driving renewed inquiries into the harmful effects of noise in humans. To bridge the translational gaps between rodents and humans, nonhuman primates (NHPs) may serve as key animal models. The phylogenetic proximity of NHPs to humans underlies tremendous similarity in many features of the auditory system (genomic, anatomical, physiological, behavioral), all of which are important considerations in the assessment and treatment of NIHL. This review summarizes the literature pertaining to NHPs as models of hearing and noise-induced hearing loss, discusses factors relevant to the translation of diagnostics and therapeutics from animals to humans, and concludes with some of the practical considerations involved in conducting NHP research.
Collapse
Affiliation(s)
- Jane A Burton
- Neuroscience Graduate Program, Vanderbilt University, Nashville, Tennessee 37212, USA
| | - Michelle D Valero
- Eaton Peabody Laboratories at Massachusetts Eye and Ear, Boston, Massachusetts 02114, USA
| | - Troy A Hackett
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
| | - Ramnarayan Ramachandran
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
| |
Collapse
|
30
|
Su Y, Delgutte B. Pitch of harmonic complex tones: rate and temporal coding of envelope repetition rate in inferior colliculus of unanesthetized rabbits. J Neurophysiol 2019; 122:2468-2485. [PMID: 31664871 DOI: 10.1152/jn.00512.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Harmonic complex tones (HCTs) found in speech, music, and animal vocalizations evoke strong pitch percepts at their fundamental frequencies. The strongest pitches are produced by HCTs that contain harmonics resolved by cochlear frequency analysis, but HCTs containing solely unresolved harmonics also evoke a weaker pitch at their envelope repetition rate (ERR). In the auditory periphery, neurons phase lock to the stimulus envelope, but this temporal representation of ERR degrades and gives way to rate codes along the ascending auditory pathway. To assess the role of the inferior colliculus (IC) in such transformations, we recorded IC neuron responses to HCT and sinusoidally modulated broadband noise (SAMN) with varying ERR from unanesthetized rabbits. Different interharmonic phase relationships of HCT were used to manipulate the temporal envelope without changing the power spectrum. Many IC neurons demonstrated band-pass rate tuning to ERR between 60 and 1,600 Hz for HCT and between 40 and 500 Hz for SAMN. The tuning was not related to the pure-tone best frequency of neurons but was dependent on the shape of the stimulus envelope, indicating a temporal rather than spectral origin. A phenomenological model suggests that the tuning may arise from peripheral temporal response patterns via synaptic inhibition. We also characterized temporal coding to ERR. Some IC neurons could phase lock to the stimulus envelope up to 900 Hz for either HCT or SAMN, but phase locking was weaker with SAMN. Together, the rate code and the temporal code represent a wide range of ERR, providing strong cues for the pitch of unresolved harmonics.NEW & NOTEWORTHY Envelope repetition rate (ERR) provides crucial cues for pitch perception of frequency components that are not individually resolved by the cochlea, but the neural representation of ERR for stimuli containing many harmonics is poorly characterized. Here we show that the pitch of stimuli with unresolved harmonics is represented by both a rate code and a temporal code for ERR in auditory midbrain neurons and propose possible underlying neural mechanisms with a computational model.
Collapse
Affiliation(s)
- Yaqing Su
- Eaton-Peabody Labs, Massachusetts Eye and Ear, Boston, Massachusetts.,Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Bertrand Delgutte
- Eaton-Peabody Labs, Massachusetts Eye and Ear, Boston, Massachusetts.,Department of Otolaryngology, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
31
|
Carcagno S, Lakhani S, Plack CJ. Consonance perception beyond the traditional existence region of pitch. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2279. [PMID: 31671967 DOI: 10.1121/1.5127845] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/12/2019] [Indexed: 06/10/2023]
Abstract
Some theories posit that the perception of consonance is based on neural periodicity detection, which is dependent on accurate phase locking of auditory nerve fibers to features of the stimulus waveform. In the current study, 15 listeners were asked to rate the pleasantness of complex tone dyads (2 note chords) forming various harmonic intervals and bandpass filtered in a high-frequency region (all components >5.8 kHz), where phase locking to the rapid stimulus fine structure is thought to be severely degraded or absent. The two notes were presented to opposite ears. Consonant intervals (minor third and perfect fifth) received higher ratings than dissonant intervals (minor second and tritone). The results could not be explained in terms of phase locking to the slower waveform envelope because the preference for consonant intervals was higher when the stimuli were harmonic, compared to a condition in which they were made inharmonic by shifting their component frequencies by a constant offset, so as to preserve their envelope periodicity. Overall the results indicate that, if phase locking is indeed absent at frequencies greater than ∼5 kHz, neural periodicity detection is not necessary for the perception of consonance.
Collapse
Affiliation(s)
- Samuele Carcagno
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, United Kingdom
| | - Saday Lakhani
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, United Kingdom
| | - Christopher J Plack
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, United Kingdom
| |
Collapse
|
32
|
Brown AD, Anbuhl KL, Gilmer JI, Tollin DJ. Between-ear sound frequency disparity modulates a brain stem biomarker of binaural hearing. J Neurophysiol 2019; 122:1110-1122. [PMID: 31314646 PMCID: PMC6766741 DOI: 10.1152/jn.00057.2019] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 07/02/2019] [Accepted: 07/03/2019] [Indexed: 11/22/2022] Open
Abstract
The auditory brain stem response (ABR) is an evoked potential that indexes a cascade of neural events elicited by sound. In the present study we evaluated the influence of sound frequency on a derived component of the ABR known as the binaural interaction component (BIC). Specifically, we evaluated the effect of acoustic interaural (between-ear) frequency mismatch on BIC amplitude. Goals were to 1) increase basic understanding of sound features that influence this long-studied auditory potential and 2) gain insight about the persistence of the BIC with interaural electrode mismatch in human users of bilateral cochlear implants, presently a limitation on the prospective utility of the BIC in audiological settings. Data were collected in an animal model that is audiometrically similar to humans, the chinchilla (Chinchilla lanigera; 6 females). Frequency disparities and amplitudes of acoustic stimuli were varied over broad ranges, and associated variation of BIC amplitude was quantified. Subsequently, responses were simulated with the use of established models of the brain stem pathway thought to underlie the BIC. Collectively, the data demonstrate that at high sound intensities (≥85 dB SPL), the acoustically elicited BIC persisted with interaurally disparate stimulation (click frequencies ≥1.5 octaves apart). However, sharper tuning emerged at moderate sound intensities (65 dB SPL), with the largest BIC occurring for stimulus frequencies within ~0.8 octaves, equivalent to ±1 mm in cochlear place. Such responses were consistent with simulated responses of the presumed brain stem generator of the BIC, the lateral superior olive. The data suggest that leveraging focused electrical stimulation strategies could improve BIC-based bilateral cochlear implant fitting outcomes.NEW & NOTEWORTHY Traditional hearing tests evaluate each ear independently. Diagnosis and treatment of binaural hearing dysfunction remains a basic challenge for hearing clinicians. We demonstrate in an animal model that the prospective utility of a noninvasive electrophysiological signature of binaural function, the binaural interaction component (BIC), depends strongly on the intensity of auditory stimulation. Data suggest that more informative BIC measurements could be obtained with clinical protocols leveraging stimuli restricted in effective bandwidth.
Collapse
Affiliation(s)
- Andrew D Brown
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington
| | - Kelsey L Anbuhl
- Center for Neural Science, New York University, New York, New York
| | - Jesse I Gilmer
- Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, Colorado
- Neuroscience Training Program, University of Colorado School of Medicine, Aurora, Colorado
| | - Daniel J Tollin
- Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, Colorado
- Neuroscience Training Program, University of Colorado School of Medicine, Aurora, Colorado
- Department of Otolaryngology, University of Colorado School of Medicine, Aurora, Colorado
| |
Collapse
|
33
|
Abstract
Humans and other animals use spatial hearing to rapidly localize events in the environment. However, neural encoding of sound location is a complex process involving the computation and integration of multiple spatial cues that are not represented directly in the sensory organ (the cochlea). Our understanding of these mechanisms has increased enormously in the past few years. Current research is focused on the contribution of animal models for understanding human spatial audition, the effects of behavioural demands on neural sound location encoding, the emergence of a cue-independent location representation in the auditory cortex, and the relationship between single-source and concurrent location encoding in complex auditory scenes. Furthermore, computational modelling seeks to unravel how neural representations of sound source locations are derived from the complex binaural waveforms of real-life sounds. In this article, we review and integrate the latest insights from neurophysiological, neuroimaging and computational modelling studies of mammalian spatial hearing. We propose that the cortical representation of sound location emerges from recurrent processing taking place in a dynamic, adaptive network of early (primary) and higher-order (posterior-dorsal and dorsolateral prefrontal) auditory regions. This cortical network accommodates changing behavioural requirements and is especially relevant for processing the location of real-life, complex sounds and complex auditory scenes.
Collapse
|
34
|
Tarnowska E, Wicher A, Moore BCJ. The effect of musicianship, contralateral noise, and ear of presentation on the detection of changes in temporal fine structure. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1. [PMID: 31370621 DOI: 10.1121/1.5114820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 06/07/2019] [Indexed: 06/10/2023]
Abstract
Musicians are better than non-musicians at discriminating changes in the fundamental frequency (F0) of harmonic complex tones. Such discrimination may be based on place cues derived from low resolved harmonics, envelope cues derived from high harmonics, and temporal fine structure (TFS) cues derived from both low and high harmonics. The present study compared the ability of highly trained violinists and non-musicians to discriminate changes in complex sounds that differed primarily in their TFS. The task was to discriminate harmonic (H) and frequency-shifted inharmonic (I) tones that were bandpass filtered such that the components were largely or completely unresolved. The effect of contralateral noise and ear of presentation was also investigated. It was hypothesized that contralateral noise would activate the efferent system, helping to preserve the neural representation of envelope fluctuations in the H and I stimuli, thereby improving their discrimination. Violinists were significantly better than non-musicians at discriminating the H and I tones. However, contralateral noise and ear of presentation had no effect. It is concluded that, compared to non-musicians, violinists have a superior ability to discriminate complex sounds based on their TFS, and this ability is unaffected by contralateral stimulation or ear of presentation.
Collapse
Affiliation(s)
- Emilia Tarnowska
- Department of Psychoacoustics and Room Acoustics, Institute of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznań, Umultowska 85, 61-614 Poland
| | - Andrzej Wicher
- Department of Psychoacoustics and Room Acoustics, Institute of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznań, Umultowska 85, 61-614 Poland
| | - Brian C J Moore
- Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom
| |
Collapse
|
35
|
Verschooten E, Shamma S, Oxenham AJ, Moore BCJ, Joris PX, Heinz MG, Plack CJ. The upper frequency limit for the use of phase locking to code temporal fine structure in humans: A compilation of viewpoints. Hear Res 2019; 377:109-121. [PMID: 30927686 PMCID: PMC6524635 DOI: 10.1016/j.heares.2019.03.011] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/09/2019] [Accepted: 03/13/2019] [Indexed: 11/27/2022]
Abstract
The relative importance of neural temporal and place coding in auditory perception is still a matter of much debate. The current article is a compilation of viewpoints from leading auditory psychophysicists and physiologists regarding the upper frequency limit for the use of neural phase locking to code temporal fine structure in humans. While phase locking is used for binaural processing up to about 1500 Hz, there is disagreement regarding the use of monaural phase-locking information at higher frequencies. Estimates of the general upper limit proposed by the contributors range from 1500 to 10000 Hz. The arguments depend on whether or not phase locking is needed to explain psychophysical discrimination performance at frequencies above 1500 Hz, and whether or not the phase-locked neural representation is sufficiently robust at these frequencies to provide useable information. The contributors suggest key experiments that may help to resolve this issue, and experimental findings that may cause them to change their minds. This issue is of crucial importance to our understanding of the neural basis of auditory perception in general, and of pitch perception in particular.
Collapse
Affiliation(s)
- Eric Verschooten
- Laboratory of Auditory Neurophysiology, KU Leuven, B-3000, Leuven, Belgium
| | - Shihab Shamma
- Institute for Systems Research and Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA; Laboratory of Sensory Perception, Department of Cognitive Studies, Ecole Normale Superieure, 29 Rue d'Ulm, Paris, 75005, France
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - Brian C J Moore
- Department of Psychology, University of Cambridge, Downing Street, Cambridge, CB2 3EB, UK
| | - Philip X Joris
- Laboratory of Auditory Neurophysiology, KU Leuven, B-3000, Leuven, Belgium
| | - Michael G Heinz
- Departments of Speech, Language, & Hearing Sciences and Biomedical Engineering, Purdue University, 715 Clinic Drive, West Lafayette, IN, 47907, USA
| | - Christopher J Plack
- Manchester Centre for Audiology and Deafness, The University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK.
| |
Collapse
|
36
|
Hartmann WM, Cariani PA, Colburn HS. Noise edge pitch and models of pitch perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1993. [PMID: 31046377 PMCID: PMC7112715 DOI: 10.1121/1.5093546] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 01/31/2019] [Accepted: 02/19/2019] [Indexed: 06/09/2023]
Abstract
Monaural noise edge pitch (NEP) is evoked by a broadband noise with a sharp falling edge in the power spectrum. The pitch is heard near the spectral edge frequency but shifted slightly into the frequency region of the noise. Thus, the pitch of a lowpass (LP) noise is matched by a pure tone typically 2%-5% below the edge, whereas the pitch of highpass (HP) noise is matched a comparable amount above the edge. Musically trained listeners can recognize musical intervals between NEPs. The pitches can be understood from a temporal pattern-matching model of pitch perception based on the peaks of a simplified autocorrelation function. The pitch shifts arise from limits on the autocorrelation window duration. An alternative place-theory approach explains the pitch shifts as the result of lateral inhibition. Psychophysical experiments using edge frequencies of 100 Hz and below find that LP-noise pitches exist but HP-noise pitches do not. The result is consistent with a temporal analysis in tonotopic regions outside the noise band. LP and HP experiments with high-frequency edges find that pitch tends to disappear as the edge frequency approaches 5000 Hz, as expected from a timing theory, though exceptional listeners can go an octave higher.
Collapse
Affiliation(s)
- William M Hartmann
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Road, East Lansing, Michigan 48824, USA
| | - Peter A Cariani
- Hearing Research Center, Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, Massachusetts 02115, USA
| | - H Steven Colburn
- Hearing Research Center, Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, Massachusetts 02115, USA
| |
Collapse
|
37
|
Graves JE, Oxenham AJ. Pitch discrimination with mixtures of three concurrent harmonic complexes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:2072. [PMID: 31046318 PMCID: PMC6469983 DOI: 10.1121/1.5096639] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 02/19/2019] [Accepted: 03/13/2019] [Indexed: 06/09/2023]
Abstract
In natural listening contexts, especially in music, it is common to hear three or more simultaneous pitches, but few empirical or theoretical studies have addressed how this is achieved. Place and pattern-recognition theories of pitch require at least some harmonics to be spectrally resolved for pitch to be extracted, but it is unclear how often such conditions exist when multiple complex tones are presented together. In three behavioral experiments, mixtures of three concurrent complexes were filtered into a single bandpass spectral region, and the relationship between the fundamental frequencies and spectral region was varied in order to manipulate the extent to which harmonics were resolved either before or after mixing. In experiment 1, listeners discriminated major from minor triads (a difference of 1 semitone in one note of the triad). In experiments 2 and 3, listeners compared the pitch of a probe tone with that of a subsequent target, embedded within two other tones. All three experiments demonstrated above-chance performance, even in conditions where the combinations of harmonic components were unlikely to be resolved after mixing, suggesting that fully resolved harmonics may not be necessary to extract the pitch from multiple simultaneous complexes.
Collapse
Affiliation(s)
- Jackson E Graves
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|