1
|
Guo X, Benzaquén E, Holmes E, Berger JI, Brühl IC, Sedley W, Rushton SP, Griffiths T. Predicting speech-in-noise ability with static and dynamic auditory figure-ground analysis using structural equation modelling. Proc Biol Sci 2025; 292:20242503. [PMID: 40041963 PMCID: PMC11881018 DOI: 10.1098/rspb.2024.2503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/09/2024] [Accepted: 02/05/2025] [Indexed: 05/12/2025] Open
Abstract
Auditory figure-ground paradigms assess the ability to extract a foreground figure from a random background, a crucial part of central hearing. Previous studies have shown that the ability to extract static figures predicts speech-in-noise ability. In this study, we assessed both fixed and dynamic figures: the latter comprised component frequencies that vary over time like natural speech. We examined how well speech-in-noise ability (for words and sentences) could be predicted by age, peripheral hearing, static and dynamic figure-ground with 159 participants. Regression demonstrated that in addition to audiogram and age, low-frequency dynamic figure-ground accounted for an independent variance of both word- and sentence-in-noise perception, higher than the static figure-ground. The structural equation models showed that a combination of all types of figure-ground tasks and age and audiogram could explain up to 89% of the variance in speech-in-noise, and figure-ground predicted speech-in-noise with a higher effect size than the audiogram or age. Age influenced word perception in noise directly but sentence perception indirectly via effects on peripheral and central hearing. Overall, this study demonstrates that dynamic figure-ground predicts a significant variance in real-life listening better than the prototype figure-ground. The combination of figure-ground tasks predicts real-life listening better than audiogram or age.
Collapse
Affiliation(s)
- Xiaoxuan Guo
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Ester Benzaquén
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Emma Holmes
- Department of Speech Hearing and Phonetic Sciences, UCL, London, UK
| | - Joel Isaac Berger
- Human Brain Research Laboratory, Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa, IA, USA
| | | | - William Sedley
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | | | - Tim Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
2
|
Pande A, Mishra D. Assessment of Pepper Robot's Speech Recognition System through the Lens of Machine Learning. Biomimetics (Basel) 2024; 9:391. [PMID: 39056832 PMCID: PMC11274617 DOI: 10.3390/biomimetics9070391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/05/2024] [Accepted: 06/24/2024] [Indexed: 07/28/2024] Open
Abstract
Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper's speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements.
Collapse
Affiliation(s)
| | - Deepti Mishra
- Educational Technology Laboratory, Intelligent System and Analytics Group, Department of Computer Science (IDI), Norwegian University of Science and Technology, 2815 Gjøvik, Norway;
| |
Collapse
|
3
|
Abstract
Hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. We explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise for normal hearing listeners. We measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. Harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. Harmonicity also improved discrimination of sounds in noise. The largest benefits were observed for two-note up-down "pitch" discrimination and melodic contour discrimination, both of which could be performed equally well with harmonic and inharmonic tones in quiet, but which showed large harmonic advantages in noise. The results show that harmonicity facilitates hearing in noise, plausibly by providing a noise-robust pitch cue that aids detection and discrimination.
Collapse
|
4
|
Lau BK, Oxenham AJ, Werner LA. Infant Pitch and Timbre Discrimination in the Presence of Variation in the Other Dimension. J Assoc Res Otolaryngol 2021; 22:693-702. [PMID: 34519951 DOI: 10.1007/s10162-021-00807-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 07/02/2021] [Indexed: 11/25/2022] Open
Abstract
Adult listeners perceive pitch with fine precision, with many adults capable of discriminating less than a 1 % change in fundamental frequency (F0). Although there is variability across individuals, this precise pitch perception is an ability ascribed to cortical functions that are also important for speech and music perception. Infants display neural immaturity in the auditory cortex, suggesting that pitch discrimination may improve throughout infancy. In two experiments, we tested the limits of F0 (pitch) and spectral centroid (timbre) perception in 66 infants and 31 adults. Contrary to expectations, we found that infants at both 3 and 7 months were able to reliably detect small changes in F0 in the presence of random variations in spectral content, and vice versa, to the extent that their performance matched that of adults with musical training and exceeded that of adults without musical training. The results indicate high fidelity of F0 and spectral-envelope coding in infants, implying that fully mature cortical processing is not necessary for accurate discrimination of these features. The surprising difference in performance between infants and musically untrained adults may reflect a developmental trajectory for learning natural statistical covariations between pitch and timbre that improves coding efficiency but results in degraded performance in adults without musical training when expectations for such covariations are violated.
Collapse
Affiliation(s)
- Bonnie K Lau
- Institute for Language and Brain Sciences, University of Washington, 1715 NE Columbia Rd, Box 357988, Seattle, WA, 98195, USA.
- Department of Otolaryngology - Head and Neck Surgery, University of Washington, 1701 NE Columbia Rd, Box 357923, Seattle, WA, 98195, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, MN, 55455, USA
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Box 354875, Seattle, WA, 98105, USA
| |
Collapse
|
5
|
McPherson MJ, McDermott JH. Diversity in pitch perception revealed by task dependence. Nat Hum Behav 2018; 2:52-66. [PMID: 30221202 PMCID: PMC6136452 DOI: 10.1038/s41562-017-0261-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 11/08/2017] [Indexed: 01/12/2023]
Abstract
Pitch conveys critical information in speech, music, and other natural sounds, and is conventionally defined as the perceptual correlate of a sound's fundamental frequency (F0). Although pitch is widely assumed to be subserved by a single F0 estimation process, real-world pitch tasks vary enormously, raising the possibility of underlying mechanistic diversity. To probe pitch mechanisms we conducted a battery of pitch-related music and speech tasks using conventional harmonic sounds and inharmonic sounds whose frequencies lack a common F0. Some pitch-related abilities - those relying on musical interval or voice recognition - were strongly impaired by inharmonicity, suggesting a reliance on F0. However, other tasks, including those dependent on pitch contours in speech and music, were unaffected by inharmonicity, suggesting a mechanism that tracks the frequency spectrum rather than the F0. The results suggest that pitch perception is mediated by several different mechanisms, only some of which conform to traditional notions of pitch.
Collapse
Affiliation(s)
- Malinda J McPherson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
6
|
Lau BK, Mehta AH, Oxenham AJ. Superoptimal Perceptual Integration Suggests a Place-Based Representation of Pitch at High Frequencies. J Neurosci 2017; 37:9013-9021. [PMID: 28821642 PMCID: PMC5597982 DOI: 10.1523/jneurosci.1507-17.2017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 07/29/2017] [Accepted: 08/05/2017] [Indexed: 11/21/2022] Open
Abstract
Pitch, the perceptual correlate of sound repetition rate or frequency, plays an important role in speech perception, music perception, and listening in complex acoustic environments. Despite the perceptual importance of pitch, the neural mechanisms that underlie it remain poorly understood. Although cortical regions responsive to pitch have been identified, little is known about how pitch information is extracted from the inner ear itself. The two primary theories of peripheral pitch coding involve stimulus-driven spike timing, or phase locking, in the auditory nerve (time code), and the spatial distribution of responses along the length of the cochlear partition (place code). To rule out the use of timing information, we tested pitch discrimination of very high-frequency tones (>8 kHz), well beyond the putative limit of phase locking. We found that high-frequency pure-tone discrimination was poor, but when the tones were combined into a harmonic complex, a dramatic improvement in discrimination ability was observed that exceeded performance predicted by the optimal integration of peripheral information from each of the component frequencies. The results are consistent with the existence of pitch-sensitive neurons that rely only on place-based information from multiple harmonically related components. The results also provide evidence against the common assumption that poor high-frequency pure-tone pitch perception is the result of peripheral neural-coding constraints. The finding that place-based spectral coding is sufficient to elicit complex pitch at high frequencies has important implications for the design of future neural prostheses to restore hearing to deaf individuals.SIGNIFICANCE STATEMENT The question of how pitch is represented in the ear has been debated for over a century. Two competing theories involve timing information from neural spikes in the auditory nerve (time code) and the spatial distribution of neural activity along the length of the cochlear partition (place code). By using very high-frequency tones unlikely to be coded via time information, we discovered that information from the individual harmonics is combined so efficiently that performance exceeds theoretical predictions based on the optimal integration of information from each harmonic. The findings have important implications for the design of auditory prostheses because they suggest that enhanced spatial resolution alone may be sufficient to restore pitch via such implants.
Collapse
Affiliation(s)
- Bonnie K Lau
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Anahita H Mehta
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
7
|
Lau BK, Lalonde K, Oster MM, Werner LA. Infant pitch perception: Missing fundamental melody discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:65. [PMID: 28147620 PMCID: PMC6581289 DOI: 10.1121/1.4973412] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 11/11/2016] [Accepted: 12/08/2016] [Indexed: 06/06/2023]
Abstract
Although recent results show that 3-month-olds can discriminate complex tones by their missing fundamental, it is arguable whether they are discriminating on the basis of a perceived pitch. A defining characteristic of pitch is that it carries melodic information. This study investigated whether 3-month-olds, 7-month-olds, and adults can detect a change in a melody composed of missing fundamental complexes. Participants heard a seven-note melody and learned to respond to a change that violated the melodic contour. To ensure that participants were responding on the basis of pitch, the notes in the melody had missing fundamentals and varied in spectral content on each presentation. In experiment I, all melodies had the same absolute pitch, while in experiment II, the melodies were randomly transposed into one of three different keys on each presentation. Almost all participants learned to ignore the spectral changes and respond to the changed note of the melody in both experiments, strengthening the argument that complex tones elicit a sense of musical pitch in infants. These results provide evidence that complex pitch perception is functional by 3 months of age.
Collapse
Affiliation(s)
- Bonnie K Lau
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| | - Kaylah Lalonde
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| | - Monika-Maria Oster
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| |
Collapse
|
8
|
Complex pitch perception mechanisms are shared by humans and a New World monkey. Proc Natl Acad Sci U S A 2015; 113:781-6. [PMID: 26712015 DOI: 10.1073/pnas.1516120113] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The perception of the pitch of harmonic complex sounds is a crucial function of human audition, especially in music and speech processing. Whether the underlying mechanisms of pitch perception are unique to humans, however, is unknown. Based on estimates of frequency resolution at the level of the auditory periphery, psychoacoustic studies in humans have revealed several primary features of central pitch mechanisms. It has been shown that (i) pitch strength of a harmonic tone is dominated by resolved harmonics; (ii) pitch of resolved harmonics is sensitive to the quality of spectral harmonicity; and (iii) pitch of unresolved harmonics is sensitive to the salience of temporal envelope cues. Here we show, for a standard musical tuning fundamental frequency of 440 Hz, that the common marmoset (Callithrix jacchus), a New World monkey with a hearing range similar to that of humans, exhibits all of the primary features of central pitch mechanisms demonstrated in humans. Thus, marmosets and humans may share similar pitch perception mechanisms, suggesting that these mechanisms may have emerged early in primate evolution.
Collapse
|
9
|
Lau BK, Werner LA. Perception of the pitch of unresolved harmonics by 3- and 7-month-old human infants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:760-767. [PMID: 25096110 PMCID: PMC4144174 DOI: 10.1121/1.4887464] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Revised: 06/22/2014] [Accepted: 06/25/2014] [Indexed: 06/03/2023]
Abstract
Three-month-olds discriminate resolved harmonic complexes on the basis of missing fundamental (MF) pitch. In view of reported difficulty in discriminating unresolved complexes at 7 months and striking changes in the organization of the auditory system during early infancy, infants' ability to discriminate unresolved complexes is of some interest. This study investigated the ability of 3-month-olds, 7-month-olds, and adults to discriminate the pitch of unresolved harmonic complexes using an observer-based method. Stimuli were MF complexes bandpass filtered with a -12 dB/octave slope, combined in random phase, presented at 70 dB sound pressure level (SPL) for 650 ms with a 50 ms rise/fall with a pink noise at 65 dB SPL. The conditions were (1) "LOW" unresolved harmonics (2500-4500 Hz) based on MFs of 160 and 200 Hz and (2) "HIGH" unresolved harmonics (4000-6000 Hz) based on MFs of 190 and 200 Hz. To demonstrate MF discrimination, participants had to ignore spectral changes in complexes with the same fundamental and respond only when the fundamental changed. Nearly all infants tested categorized complexes by MF pitch suggesting discrimination of pitch extracted from unresolved harmonics by 3 months. Adults also categorized the complexes by MF pitch, although musically trained adults were more successful than musically untrained adults.
Collapse
Affiliation(s)
- Bonnie K Lau
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Seattle, Washington 98105
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Seattle, Washington 98105
| |
Collapse
|
10
|
Abstract
Pitch, our perception of how high or low a sound is on a musical scale, is a fundamental perceptual attribute of sounds and is important for both music and speech. After more than a century of research, the exact mechanisms used by the auditory system to extract pitch are still being debated. Theoretically, pitch can be computed using either spectral or temporal acoustic features of a sound. We have investigated how cues derived from the temporal envelope and spectrum of an acoustic signal are used for pitch extraction in the common marmoset (Callithrix jacchus), a vocal primate species, by measuring pitch discrimination behaviorally and examining pitch-selective neuronal responses in auditory cortex. We find that pitch is extracted by marmosets using temporal envelope cues for lower pitch sounds composed of higher-order harmonics, whereas spectral cues are used for higher pitch sounds with lower-order harmonics. Our data support dual-pitch processing mechanisms, originally proposed by psychophysicists based on human studies, whereby pitch is extracted using a combination of temporal envelope and spectral cues.
Collapse
|
11
|
Lau BK, Werner LA. Perception of missing fundamental pitch by 3- and 4-month-old human infants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:3874-82. [PMID: 23231118 PMCID: PMC3528754 DOI: 10.1121/1.4763991] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Revised: 09/29/2012] [Accepted: 10/04/2012] [Indexed: 06/01/2023]
Abstract
A hallmark of complex pitch perception is that the pitch of a harmonic complex is the same whether or not the fundamental frequency is present. By 7 months, infants appear to discriminate on the basis of the pitch of the missing fundamental (MF). Although electrophysiological cortical responses to MF pitch changes have been recorded in infants younger than 7 months, no psychophysical studies have been published. This study investigated the ability of 3- and 4-month-olds to perceive the pitch of MF harmonic complexes based on fundamentals of 160 Hz and 200 Hz using an observer-based method. In experiment I, to demonstrate MF pitch discrimination, 3- and 4-month-olds were required to ignore spectral changes in complexes with the same fundamental and to respond only when the fundamental changed. In experiment II, a 60-260 Hz noise was presented with complexes to mask combination tones at the fundamental frequency. In experiment III, complexes were bandpass filtered with a -12 dB/octave slope to limit use of spectral edge cues and presented with a pink noise to mask all distortion products. Nearly all infants tested categorized complexes by MF pitch in these experiments, suggesting perception of the missing fundamental at 3 months.
Collapse
Affiliation(s)
- Bonnie K Lau
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Seattle, Washington 98105, USA.
| | | |
Collapse
|
12
|
Shofner WP, Campbell J. Pitch strength of noise-vocoded harmonic tone complexes in normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:EL398-404. [PMID: 23145701 PMCID: PMC3482252 DOI: 10.1121/1.4757697] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2012] [Accepted: 09/18/2012] [Indexed: 06/01/2023]
Abstract
To study the role of harmonic structure in pitch perception, normal-hearing listeners were tested using noise-vocoded harmonic tone complexes. When tested in a magnitude judgment procedure using vocoded versions generated with 2-128 channels, judgments of pitch strength increased systematically as the number of channels increased and reflected acoustic cues based on harmonic peak-to-valley ratio, but not cues based on periodicity strength. When tested in a fundamental frequency discrimination task, listeners correctly recognized the direction of pitch change with as few as eight noise-vocoded channels. The results suggest that spectral processing contributes substantially to pitch perception in normal-hearing listeners.
Collapse
Affiliation(s)
- William P Shofner
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA.
| | | |
Collapse
|