1
|
Benz KR, Hauswald A, Weisz N. Influence of visual analogue of speech envelope, formants, and word onsets on word recognition is not pronounced. Hear Res 2025; 460:109237. [PMID: 40096812 DOI: 10.1016/j.heares.2025.109237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 02/26/2025] [Accepted: 03/05/2025] [Indexed: 03/19/2025]
Abstract
In noisy environments, filtering out the relevant speech signal from the background noise is a major challenge. Visual cues, such as lip movements, can improve speech understanding. This suggests that lip movements carry information about speech features (e.g. speech envelope, formants, word onsets) that can be used to aid speech understanding. Moreover, the isolated visual or tactile presentation of the speech envelope can also aid word recognition. However, the evidence in this area is rather mixed, and formants and word onsets have not been studied in this context. This online study investigates the effect of different visually presented speech features (speech envelope, formants, word onsets) during a two-talker audio on word recognition. The speech features were presented as a circle whose size was modulated over time based on the dynamics of three speech features. The circle was either modulated according to the speech features of the target speaker, the distractor speaker or an unrelated control sentence. After each sentence, the participants` word recognition was tested by writing down what they heard. We show that word recognition is not enhanced for any of the visual features relative to the visual control condition.
Collapse
Affiliation(s)
- Kaja Rosa Benz
- Centre for Cognitive Neuroscience, Department of Psychology, Paris-Lodron-University of Salzburg, Salzburg, Austria.
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, Department of Psychology, Paris-Lodron-University of Salzburg, Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, Department of Psychology, Paris-Lodron-University of Salzburg, Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University Salzburg, Salzburg, Austria
| |
Collapse
|
2
|
Daeglau M, Otten J, Grimm G, Mirkovic B, Hohmann V, Debener S. Neural speech tracking in a virtual acoustic environment: audio-visual benefit for unscripted continuous speech. Front Hum Neurosci 2025; 19:1560558. [PMID: 40270565 PMCID: PMC12014754 DOI: 10.3389/fnhum.2025.1560558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 03/27/2025] [Indexed: 04/25/2025] Open
Abstract
The audio-visual benefit in speech perception-where congruent visual input enhances auditory processing-is well-documented across age groups, particularly in challenging listening conditions and among individuals with varying hearing abilities. However, most studies rely on highly controlled laboratory environments with scripted stimuli. Here, we examine the audio-visual benefit using unscripted, natural speech from untrained speakers within a virtual acoustic environment. Using electroencephalography (EEG) and cortical speech tracking, we assessed neural responses across audio-visual, audio-only, visual-only, and masked-lip conditions to isolate the role of lip movements. Additionally, we analysed individual differences in acoustic and visual features of the speakers, including pitch, jitter, and lip-openness, to explore their influence on the audio-visual speech tracking benefit. Results showed a significant audio-visual enhancement in speech tracking with background noise, with the masked-lip condition performing similarly to the audio-only condition, emphasizing the importance of lip movements in adverse listening situations. Our findings reveal the feasibility of cortical speech tracking with naturalistic stimuli and underscore the impact of individual speaker characteristics on audio-visual integration in real-world listening contexts.
Collapse
Affiliation(s)
- Mareike Daeglau
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Jürgen Otten
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Giso Grimm
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Volker Hohmann
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
3
|
Zanin J, Vaisberg J, Swann S, Rance G. Evaluating benefits of remote microphone technology for adults with hearing loss using behavioural and predictive metrics. Int J Audiol 2025; 64:327-335. [PMID: 38767343 DOI: 10.1080/14992027.2024.2354500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/02/2024] [Accepted: 05/07/2024] [Indexed: 05/22/2024]
Abstract
OBJECTIVE To investigate the benefit of remote-microphone (RM) systems for adults with sensory hearing loss. DESIGN Speech recognition in quiet and in background noise was assessed. Participants with hearing loss underwent testing in two device conditions: hearing aids (HAs) alone and HAs with a RM. Normal hearing participants completed testing in the unaided condition. Predictive speech intelligibility modelling using the Hearing-Aid Speech Perception Index (HASPI) was also performed on recordings of HA processed test material. STUDY SAMPLE Twenty adults with sensory hearing loss and 10 adults with normal hearing participated. RESULTS Speech recognition for participants with hearing loss improved significantly when using the RM compared to HAs alone fit to Phonak's proprietary prescription. Largest benefits were observed in the most challenging conditions. At the lowest signal-to-noise ratio, participants with hearing loss using a RM outperformed normal hearing listeners. Predicted intelligibility scores produced by HASPI were strongly correlated to behavioural results. CONCLUSIONS Adults using HAs who have significant difficulties understanding speech in noise will experience considerable benefits with the addition of a RM. Improvements in speech recognition were observed for all participants using RM systems, including those with relatively mild hearing loss. HASPI modelling reliably predicted the speech perception difficulties experienced.
Collapse
Affiliation(s)
- Julien Zanin
- Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Jonathan Vaisberg
- Innovation Centre Toronto, Sonova Canda Inc., Mississauga, Ontario, Canada
| | - Sarah Swann
- Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Gary Rance
- Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
4
|
Micula A, Holmer E, Ning R, Danielsson H. Relationships Between Hearing Status, Cognitive Abilities, and Reliance on Visual and Contextual Cues. Ear Hear 2025; 46:433-443. [PMID: 39307930 PMCID: PMC11825487 DOI: 10.1097/aud.0000000000001596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 08/19/2024] [Indexed: 02/16/2025]
Abstract
OBJECTIVES Visual and contextual cues facilitate speech recognition in suboptimal listening conditions (e.g., background noise, hearing loss, hearing aid signal processing). Moreover, successful speech recognition in challenging listening conditions is linked to cognitive abilities such as working memory and fluid intelligence. However, it is unclear which cognitive abilities facilitate the use of visual and contextual cues in individuals with normal hearing and hearing aid users. The first aim was to investigate whether individuals with hearing aid users rely on visual and contextual cues to a higher degree than individuals with normal hearing in a speech-in-noise recognition task. The second aim was to investigate whether working memory and fluid intelligence are associated with the use of visual and contextual cues in these groups. DESIGN Groups of participants with normal hearing and hearing aid users with bilateral, symmetrical mild to severe sensorineural hearing loss were included (n = 169 per group). The Samuelsson and Rönnberg task was administered to measure speech recognition in speech-shaped noise. The task consists of an equal number of sentences administered in the auditory and audiovisual modalities, as well as without and with contextual cues (visually presented word preceding the sentence, e.g.,: "Restaurant"). The signal to noise ratio was individually set to 1 dB below the level obtained for 50% correct speech recognition in the hearing-in-noise test administered in the auditory modality. The Reading Span test was used to measure working memory capacity and the Raven test was used to measure fluid intelligence. The data were analyzed using linear mixed-effects modeling. RESULTS Both groups exhibited significantly higher speech recognition performance when visual and contextual cues were available. Although the hearing aid users performed significantly worse compared to those with normal hearing in the auditory modality, both groups reached similar performance levels in the audiovisual modality. In addition, a significant positive relationship was found between the Raven test score and speech recognition performance only for the hearing aid users in the audiovisual modality. There was no significant relationship between Reading Span test score and performance. CONCLUSIONS Both participants with normal hearing and hearing aid users benefitted from contextual cues, regardless of cognitive abilities. The hearing aid users relied on visual cues to compensate for the perceptual difficulties, reaching a similar performance level as the participants with normal hearing when visual cues were available, despite worse performance in the auditory modality. It is important to note that the hearing aid users who had higher fluid intelligence were able to capitalize on visual cues more successfully than those with poorer fluid intelligence, resulting in better speech-in-noise recognition performance.
Collapse
Affiliation(s)
- Andreea Micula
- National Institute of Public Health, University of Southern Denmark, Copenhagen, Denmark
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
| | - Emil Holmer
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
| | - Ruijing Ning
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
| | - Henrik Danielsson
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
5
|
von Seth J, Aller M, Davis MH. Unimodal speech perception predicts stable individual differences in audiovisual benefit for phonemes, words and sentencesa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1554-1576. [PMID: 40029090 DOI: 10.1121/10.0034846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/17/2024] [Indexed: 03/05/2025]
Abstract
There are substantial individual differences in the benefit that can be obtained from visual cues during speech perception. Here, 113 normally hearing participants between the ages of 18 and 60 years old completed a three-part experiment investigating the reliability and predictors of individual audiovisual benefit for acoustically degraded speech. Audiovisual benefit was calculated as the relative intelligibility (at the individual-level) of approximately matched (at the group-level) auditory-only and audiovisual speech for materials at three levels of linguistic structure: meaningful sentences, monosyllabic words, and consonants in minimal syllables. This measure of audiovisual benefit was stable across sessions and materials, suggesting that a shared mechanism of audiovisual integration operates across levels of linguistic structure. Information transmission analyses suggested that this may be related to simple phonetic cue extraction: sentence-level audiovisual benefit was reliably predicted by the relative ability to discriminate place of articulation at the consonant-level. Finally, whereas unimodal speech perception was related to cognitive measures (matrix reasoning and vocabulary) and demographics (age and gender), audiovisual benefit was predicted only by unimodal speech perceptual abilities: Better lipreading ability and subclinically poorer hearing (speech reception thresholds) independently predicted enhanced audiovisual benefit. This work has implications for practices in quantifying audiovisual benefit and research identifying strategies to enhance multimodal communication in hearing loss.
Collapse
Affiliation(s)
- Jacqueline von Seth
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Máté Aller
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
6
|
Wang X, Bouton S, Kojovic N, Giraud AL, Schaer M. Atypical audio-visual neural synchrony and speech processing in early autism. J Neurodev Disord 2025; 17:9. [PMID: 39966708 PMCID: PMC11837391 DOI: 10.1186/s11689-025-09593-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 01/24/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Children with Autism Spectrum disorder (ASD) often exhibit communication difficulties that may stem from basic auditory temporal integration impairment but also be aggravated by an audio-visual integration deficit, resulting in a lack of interest in face-to-face communication. This study addresses whether speech processing anomalies in young autistic children (mean age 3.09-year-old) are associated with alterations of audio-visual temporal integration. METHODS We used high-density electroencephalography (HD-EEG) and eye tracking to record brain activity and gaze patterns in 31 children with ASD (6 females) and 33 typically developing (TD) children (11 females), while they watched cartoon videos. Neural responses to temporal audio-visual stimuli were analyzed using Temporal Response Functions model and phase analyses for audiovisual temporal coordination. RESULTS The reconstructability of speech signals from auditory responses was reduced in children with ASD compared to TD, but despite more restricted gaze patterns in ASD it was similar for visual responses in both groups. Speech reception was most strongly affected when visual speech information was also present, an interference that was not seen in TD children. These differences were associated with a broader phase angle distribution (exceeding pi/2) in the EEG theta range in children with ASD, signaling reduced reliability of audio-visual temporal alignment. CONCLUSION These findings show that speech processing anomalies in ASD do not stand alone and that they are associated already at a very early development stage with audio-visual imbalance with poor auditory response encoding and disrupted audio-visual temporal coordination.
Collapse
Affiliation(s)
- Xiaoyue Wang
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland.
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France.
- Department of Medical Psychology and Ethics, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China.
| | - Sophie Bouton
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France
| | - Nada Kojovic
- Autism Brain & Behavior Lab, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France
| | - Marie Schaer
- Autism Brain & Behavior Lab, Department of Psychiatry, University of Geneva, Geneva, Switzerland
| |
Collapse
|
7
|
Cooper JK, Vanthornhout J, van Wieringen A, Francart T. Objectively Measuring Audiovisual Effects in Noise Using Virtual Human Speakers. Trends Hear 2025; 29:23312165251333528. [PMID: 40221967 PMCID: PMC12033406 DOI: 10.1177/23312165251333528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 03/17/2025] [Accepted: 03/24/2025] [Indexed: 04/15/2025] Open
Abstract
Speech intelligibility in challenging listening environments relies on the integration of audiovisual cues. Measuring the effectiveness of audiovisual integration in these challenging listening environments can be difficult due to the complexity of such environments. The Audiovisual True-to-Life Assessment of Auditory Rehabilitation (AVATAR) is a paradigm that was developed to provide an ecological environment to capture both the audio and visual aspects of speech intelligibility measures. Previous research has shown the benefit from audiovisual cues can be measured using behavioral (e.g., word recognition) and electrophysiological (e.g., neural tracking) measures. The current research examines, when using the AVATAR paradigm, if electrophysiological measures of speech intelligibility yield similar outcomes as behavioral measures. We hypothesized visual cues would enhance both the behavioral and electrophysiological scores as the signal-to-noise ratio (SNR) of the speech signal decreased. Twenty young (18-25 years old) participants (1 male and 19 female) with normal hearing participated in our study. For our behavioral experiment, we administered lists of sentences using an adaptive procedure to estimate a speech reception threshold (SRT). For our electrophysiological experiment, we administered 35 lists of sentences randomized across five SNR levels (silence, 0, -3, -6, and -9 dB) and two visual conditions (audio-only and audiovisual). We used a neural tracking decoder to measure the reconstruction accuracies for each participant. We observed most participants had higher reconstruction accuracies for the audiovisual condition compared to the audio-only condition in conditions with moderate to high levels of noise. We found the electrophysiological measure may correlate with the behavioral measure that shows audiovisual benefit.
Collapse
Affiliation(s)
| | | | | | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
8
|
Choi A, Kim H, Jo M, Kim S, Joung H, Choi I, Lee K. The impact of visual information in speech perception for individuals with hearing loss: a mini review. Front Psychol 2024; 15:1399084. [PMID: 39380752 PMCID: PMC11458425 DOI: 10.3389/fpsyg.2024.1399084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/09/2024] [Indexed: 10/10/2024] Open
Abstract
This review examines how visual information enhances speech perception in individuals with hearing loss, focusing on the impact of age, linguistic stimuli, and specific hearing loss factors on the effectiveness of audiovisual (AV) integration. While existing studies offer varied and sometimes conflicting findings regarding the use of visual cues, our analysis shows that these key factors can distinctly shape AV speech perception outcomes. For instance, younger individuals and those who receive early intervention tend to benefit more from visual cues, particularly when linguistic complexity is lower. Additionally, languages with dense phoneme spaces demonstrate a higher dependency on visual information, underscoring the importance of tailoring rehabilitation strategies to specific linguistic contexts. By considering these influences, we highlight areas where understanding is still developing and suggest how personalized rehabilitation strategies and supportive systems could be tailored to better meet individual needs. Furthermore, this review brings attention to important aspects that warrant further investigation, aiming to refine theoretical models and contribute to more effective, customized approaches to hearing rehabilitation.
Collapse
Affiliation(s)
- Ahyeon Choi
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
| | - Hayoon Kim
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
| | - Mina Jo
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
| | - Subeen Kim
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
| | - Haesun Joung
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
| | - Inyong Choi
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, United States
| | - Kyogu Lee
- Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
9
|
Fantoni M, Federici A, Camponogara I, Handjaras G, Martinelli A, Bednaya E, Ricciardi E, Pavani F, Bottari D. The impact of face masks on face-to-face neural tracking of speech: Auditory and visual obstacles. Heliyon 2024; 10:e34860. [PMID: 39157360 PMCID: PMC11328033 DOI: 10.1016/j.heliyon.2024.e34860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 08/20/2024] Open
Abstract
Face masks provide fundamental protection against the transmission of respiratory viruses but hamper communication. We estimated auditory and visual obstacles generated by face masks on communication by measuring the neural tracking of speech. To this end, we recorded the EEG while participants were exposed to naturalistic audio-visual speech, embedded in 5-talker noise, in three contexts: (i) no-mask (audio-visual information was fully available), (ii) virtual mask (occluded lips, but intact audio), and (iii) real mask (occluded lips and degraded audio). Neural tracking of lip movements and of the sound envelope of speech was measured through backward modeling, that is, by reconstructing stimulus properties from neural activity. Behaviorally, face masks increased perceived listening difficulty and phonological errors in speech content retrieval. At the neural level, we observed that the occlusion of the mouth abolished lip tracking and dampened neural tracking of the speech envelope at the earliest processing stages. By contrast, degraded acoustic information related to face mask filtering altered neural tracking of speech envelope at later processing stages. Finally, a consistent link emerged between the increment of perceived listening difficulty and the drop in reconstruction performance of speech envelope when attending to a speaker wearing a face mask. Results clearly dissociated the visual and auditory impact of face masks on the neural tracking of speech. While the visual obstacle related to face masks hampered the ability to predict and integrate audio-visual speech, the auditory filter generated by face masks impacted neural processing stages typically associated with auditory selective attention. The link between perceived difficulty and neural tracking drop also provides evidence of the impact of face masks on the metacognitive levels subtending face-to-face communication.
Collapse
Affiliation(s)
- M. Fantoni
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - A. Federici
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - G. Handjaras
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - E. Bednaya
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - E. Ricciardi
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - F. Pavani
- Centro Interdipartimentale Mente/Cervello–CIMEC, University of Trento, Italy
- Centro Interuniversitario di Ricerca “Cognizione Linguaggio e Sordità”–CIRCLeS, University of Trento, Italy
| | - D. Bottari
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| |
Collapse
|
10
|
Kim J, Hazan V, Tuomainen O, Davis C. Partner-directed gaze and co-speech hand gestures: effects of age, hearing loss and noise. Front Psychol 2024; 15:1324667. [PMID: 38882511 PMCID: PMC11178134 DOI: 10.3389/fpsyg.2024.1324667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 05/10/2024] [Indexed: 06/18/2024] Open
Abstract
Research on the adaptations talkers make to different communication conditions during interactive conversations has primarily focused on speech signals. We extended this type of investigation to two other important communicative signals, i.e., partner-directed gaze and iconic co-speech hand gestures with the aim of determining if the adaptations made by older adults differ from younger adults across communication conditions. We recruited 57 pairs of participants, comprising 57 primary talkers and 57 secondary ones. Primary talkers consisted of three groups: 19 older adults with mild Hearing Loss (older adult-HL); 17 older adults with Normal Hearing (older adult-NH); and 21 younger adults. The DiapixUK "spot the difference" conversation-based task was used to elicit conversions in participant pairs. One easy (No Barrier: NB) and three difficult communication conditions were tested. The three conditions consisted of two in which the primary talker could hear clearly, but the secondary talkers could not, due to multi-talker babble noise (BAB1) or a less familiar hearing loss simulation (HLS), and a condition in which both the primary and secondary talkers heard each other in babble noise (BAB2). For primary talkers, we measured mean number of partner-directed gazes; mean total gaze duration; and the mean number of co-speech hand gestures. We found a robust effects of communication condition that interacted with participant group. Effects of age were found for both gaze and gesture in BAB1, i.e., older adult-NH looked and gestured less than younger adults did when the secondary talker experienced babble noise. For hearing status, a difference in gaze between older adult-NH and older adult-HL was found for the BAB1 condition; for gesture this difference was significant in all three difficult communication conditions (older adult-HL gazed and gestured more). We propose the age effect may be due to a decline in older adult's attention to cues signaling how well a conversation is progressing. To explain the hearing status effect, we suggest that older adult's attentional decline is offset by hearing loss because these participants have learned to pay greater attention to visual cues for understanding speech.
Collapse
Affiliation(s)
- Jeesun Kim
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
| | - Valerie Hazan
- Speech Hearing and Phonetic Sciences, University College London, London, United Kingdom
| | - Outi Tuomainen
- Department of Linguistics, University of Potsdam, Potsdam, Germany
| | - Chris Davis
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
11
|
Frei V, Schmitt R, Meyer M, Giroud N. Processing of Visual Speech Cues in Speech-in-Noise Comprehension Depends on Working Memory Capacity and Enhances Neural Speech Tracking in Older Adults With Hearing Impairment. Trends Hear 2024; 28:23312165241287622. [PMID: 39444375 PMCID: PMC11520018 DOI: 10.1177/23312165241287622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 10/25/2024] Open
Abstract
Comprehending speech in noise (SiN) poses a challenge for older hearing-impaired listeners, requiring auditory and working memory resources. Visual speech cues provide additional sensory information supporting speech understanding, while the extent of such visual benefit is characterized by large variability, which might be accounted for by individual differences in working memory capacity (WMC). In the current study, we investigated behavioral and neurofunctional (i.e., neural speech tracking) correlates of auditory and audio-visual speech comprehension in babble noise and the associations with WMC. Healthy older adults with hearing impairment quantified by pure-tone hearing loss (threshold average: 31.85-57 dB, N = 67) listened to sentences in babble noise in audio-only, visual-only and audio-visual speech modality and performed a pattern matching and a comprehension task, while electroencephalography (EEG) was recorded. Behaviorally, no significant difference in task performance was observed across modalities. However, we did find a significant association between individual working memory capacity and task performance, suggesting a more complex interplay between audio-visual speech cues, working memory capacity and real-world listening tasks. Furthermore, we found that the visual speech presentation was accompanied by increased cortical tracking of the speech envelope, particularly in a right-hemispheric auditory topographical cluster. Post-hoc, we investigated the potential relationships between the behavioral performance and neural speech tracking but were not able to establish a significant association. Overall, our results show an increase in neurofunctional correlates of speech associated with congruent visual speech cues, specifically in a right auditory cluster, suggesting multisensory integration.
Collapse
Affiliation(s)
- Vanessa Frei
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
| | - Raffael Schmitt
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
| | - Martin Meyer
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- University of Zurich, University Research Priority Program Dynamics of Healthy Aging, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
- Evolutionary Neuroscience of Language, Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Cognitive Psychology Unit, Alpen-Adria University, Klagenfurt, Austria
| | - Nathalie Giroud
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
| |
Collapse
|
12
|
Jansen T, Hartog L, Oetting D, Hohmann V, Kayser H. Benefit of Hearing-Aid Amplification and Signal Enhancement for Speech Reception in Complex Listening Situations. Trends Hear 2024; 28:23312165241271407. [PMID: 39397631 PMCID: PMC11475295 DOI: 10.1177/23312165241271407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 10/15/2024] Open
Abstract
A major goal of hearing-device provision is to improve communication in daily life. However, there is still a large gap between the user's daily-life aided listening experience and hearing-aid benefit as assessed with established speech reception measurements in the laboratory and clinical practice. For a more realistic assessment, hearing-aid provision needs to be tested in suitable acoustic environments. In this study, using virtual acoustics, we developed complex acoustic scenarios to measure speech-intelligibility and listening-effort benefit obtained from hearing-aid amplification and signal enhancement strategies. Measurements were conducted using the participants' own devices and a research hearing aid, the Portable Hearing Laboratory (PHL). On the PHL, in addition to amplification, a monaural and a binaural directional filter, as well as a spectral filter were employed. We assessed the benefit from different signal enhancement strategies at the group and the individual level. At the group level, signal enhancement including directional filtering provided a higher hearing-aid benefit in challenging acoustic scenarios in terms of speech intelligibility compared to amplification alone or combined with spectral filtering. However, no difference between monaural and binaural signal enhancement occurred. On an individual level, we found large differences in hearing-aid benefit between participants. While some benefitted from signal-enhancement algorithms, others benefitted from amplification alone, but additional signal enhancement had a detrimental effect. This shows the importance of an individual selection of signal enhancement strategies as a part of the hearing-aid fitting process.
Collapse
Affiliation(s)
- Theresa Jansen
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Laura Hartog
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Dirk Oetting
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Volker Hohmann
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Oldenburg, Germany
- Department of Medical Physics and Acoustics, Auditory Signal Processing and Hearing Devices, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Hendrik Kayser
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Oldenburg, Germany
- Department of Medical Physics and Acoustics, Auditory Signal Processing and Hearing Devices, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
13
|
Haider CL, Park H, Hauswald A, Weisz N. Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations. J Cogn Neurosci 2024; 36:128-142. [PMID: 37977156 DOI: 10.1162/jocn_a_02059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Visual speech plays a powerful role in facilitating auditory speech processing and has been a publicly noticed topic with the wide usage of face masks during the COVID-19 pandemic. In a previous magnetoencephalography study, we showed that occluding the mouth area significantly impairs neural speech tracking. To rule out the possibility that this deterioration is because of degraded sound quality, in the present follow-up study, we presented participants with audiovisual (AV) and audio-only (A) speech. We further independently manipulated the trials by adding a face mask and a distractor speaker. Our results clearly show that face masks only affect speech tracking in AV conditions, not in A conditions. This shows that face masks indeed primarily impact speech processing by blocking visual speech and not by acoustic degradation. We can further highlight how the spectrogram, lip movements and lexical units are tracked on a sensor level. We can show visual benefits for tracking the spectrogram especially in the multi-speaker condition. While lip movements only show additional improvement and visual benefit over tracking of the spectrogram in clear speech conditions, lexical units (phonemes and word onsets) do not show visual enhancement at all. We hypothesize that in young normal hearing individuals, information from visual input is less used for specific feature extraction, but acts more as a general resource for guiding attention.
Collapse
Affiliation(s)
| | | | | | - Nathan Weisz
- Paris Lodron Universität Salzburg
- Paracelsus Medical University Salzburg
| |
Collapse
|
14
|
Yi H, Choudhury M, Hicks C. A Transparent Mask and Clear Speech Benefit Speech Intelligibility in Individuals With Hearing Loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:4558-4574. [PMID: 37788660 DOI: 10.1044/2023_jslhr-22-00636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
PURPOSE The purpose of the study is to investigate the impacts of a surgical mask and a transparent mask on audio-only and audiovisual speech intelligibility in noise (i.e., 0 dB signal-to-noise ratio) in individuals with mild-to-profound hearing loss. The study also examined if individuals with hearing loss can benefit from using a transparent mask and clear speech for speech understanding in noise. METHOD Thirty-one individuals with hearing loss (from 22 to 74 years old) completed keyword identification tasks to measure face-masked speech intelligibility in noise. A mixed-effects logistic regression model was used to examine the effects of face masks (no mask, transparent mask, surgical mask), presentation modes (audio only, audiovisual), speaking styles (conversational, clear), noise type (speech-shaped noise [SSN], four-talker babble [4-T babble]), hearing groups (mild hearing loss [MHL], greater than MHL: GHL), and their interactions on binary accuracy of keyword identification. RESULTS In the audio-only mode, the GHL group showed reduced speech intelligibility regardless of other factors, whereas the MHL group showed decreased speech intelligibility for the transparent mask more than for the surgical mask. The use of a transparent mask was advantageous for both hearing loss groups. Clear speech remediated the detrimental effects of face masks on speech intelligibility in noise. Both groups tended to perform better in SSN versus 4-T babble. CONCLUSIONS The findings indicate that, when using face masks, either a transparent mask or a surgical mask negatively affects speech understanding in noise for individuals with hearing loss. Using a transparent mask and clear speech could be a potential solution to improve speech intelligibility in communication with face masks in noise.
Collapse
Affiliation(s)
- Hoyoung Yi
- Department of Speech, Language, and Hearing Sciences, Texas Tech University Health Sciences Center, Lubbock
| | - Moumita Choudhury
- Department of Speech, Language, and Hearing Sciences, Texas Tech University Health Sciences Center, Lubbock
| | - Candace Hicks
- Department of Speech, Language, and Hearing Sciences, Texas Tech University Health Sciences Center, Lubbock
| |
Collapse
|
15
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
16
|
Schmitt R, Meyer M, Giroud N. Improvements in naturalistic speech-in-noise comprehension in middle-aged and older adults after 3 weeks of computer-based speechreading training. NPJ SCIENCE OF LEARNING 2023; 8:32. [PMID: 37666837 PMCID: PMC10477252 DOI: 10.1038/s41539-023-00179-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/09/2023] [Indexed: 09/06/2023]
Abstract
Problems in understanding speech in noisy environments are characteristic for age-related hearing loss. Since hearing aids do not mitigate these communication problems in every case, potential alternatives in a clinical rehabilitation plan need to be explored. This study investigates whether a computer-based speechreading training improves audiovisual speech perception in noise in a sample of middle-aged and older adults (N = 62, 47-83 years) with 32 participants completing a speechreading training and 30 participants of an active control group completing a foreign language training. Before and after training participants performed a speech-in-noise task mimicking real-life communication settings with participants being required to answer a speaker's questions. Using generalized linear mixed-effects models we found a significant improvement in audiovisual speech perception in noise in the speechreading training group. This is of great relevance as these results highlight the potential of a low-cost and easy-to-implement intervention for a profound and widespread problem as speech-in-noise comprehension impairment.
Collapse
Affiliation(s)
- Raffael Schmitt
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland.
- International Max Planck Research School on the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Zurich, Switzerland.
- Language & Medicine Centre Zurich, Competence Centre of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland.
| | - Martin Meyer
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
- Cognitive Psychology Unit, Alpen-Adria University, Klagenfurt, Austria
| | - Nathalie Giroud
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School on the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Zurich, Switzerland
- Language & Medicine Centre Zurich, Competence Centre of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
17
|
Pepper JL, Nuttall HE. Age-Related Changes to Multisensory Integration and Audiovisual Speech Perception. Brain Sci 2023; 13:1126. [PMID: 37626483 PMCID: PMC10452685 DOI: 10.3390/brainsci13081126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/20/2023] [Accepted: 07/22/2023] [Indexed: 08/27/2023] Open
Abstract
Multisensory integration is essential for the quick and accurate perception of our environment, particularly in everyday tasks like speech perception. Research has highlighted the importance of investigating bottom-up and top-down contributions to multisensory integration and how these change as a function of ageing. Specifically, perceptual factors like the temporal binding window and cognitive factors like attention and inhibition appear to be fundamental in the integration of visual and auditory information-integration that may become less efficient as we age. These factors have been linked to brain areas like the superior temporal sulcus, with neural oscillations in the alpha-band frequency also being implicated in multisensory processing. Age-related changes in multisensory integration may have significant consequences for the well-being of our increasingly ageing population, affecting their ability to communicate with others and safely move through their environment; it is crucial that the evidence surrounding this subject continues to be carefully investigated. This review will discuss research into age-related changes in the perceptual and cognitive mechanisms of multisensory integration and the impact that these changes have on speech perception and fall risk. The role of oscillatory alpha activity is of particular interest, as it may be key in the modulation of multisensory integration.
Collapse
Affiliation(s)
| | - Helen E. Nuttall
- Department of Psychology, Lancaster University, Bailrigg LA1 4YF, UK;
| |
Collapse
|
18
|
Kral A, Sharma A. Crossmodal plasticity in hearing loss. Trends Neurosci 2023; 46:377-393. [PMID: 36990952 PMCID: PMC10121905 DOI: 10.1016/j.tins.2023.02.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/27/2023] [Accepted: 02/21/2023] [Indexed: 03/29/2023]
Abstract
Crossmodal plasticity is a textbook example of the ability of the brain to reorganize based on use. We review evidence from the auditory system showing that such reorganization has significant limits, is dependent on pre-existing circuitry and top-down interactions, and that extensive reorganization is often absent. We argue that the evidence does not support the hypothesis that crossmodal reorganization is responsible for closing critical periods in deafness, and crossmodal plasticity instead represents a neuronal process that is dynamically adaptable. We evaluate the evidence for crossmodal changes in both developmental and adult-onset deafness, which start as early as mild-moderate hearing loss and show reversibility when hearing is restored. Finally, crossmodal plasticity does not appear to affect the neuronal preconditions for successful hearing restoration. Given its dynamic and versatile nature, we describe how this plasticity can be exploited for improving clinical outcomes after neurosensory restoration.
Collapse
Affiliation(s)
- Andrej Kral
- Institute of AudioNeuroTechnology and Department of Experimental Otology, Otolaryngology Clinics, Hannover Medical School, Hannover, Germany; Australian Hearing Hub, School of Medicine and Health Sciences, Macquarie University, Sydney, NSW, Australia
| | - Anu Sharma
- Department of Speech Language and Hearing Science, Center for Neuroscience, Institute of Cognitive Science, University of Colorado Boulder, Boulder, CO, USA.
| |
Collapse
|
19
|
Hadley LV, Culling JF. Timing of head turns to upcoming talkers in triadic conversation: Evidence for prediction of turn ends and interruptions. Front Psychol 2022; 13:1061582. [PMID: 36605274 PMCID: PMC9807761 DOI: 10.3389/fpsyg.2022.1061582] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/24/2022] [Indexed: 12/24/2022] Open
Abstract
In conversation, people are able to listen to an utterance and respond within only a few hundred milliseconds. It takes substantially longer to prepare even a simple utterance, suggesting that interlocutors may make use of predictions about when the talker is about to end. But it is not only the upcoming talker that needs to anticipate the prior talker ending-listeners that are simply following the conversation could also benefit from predicting the turn end in order to shift attention appropriately with the turn switch. In this paper, we examined whether people predict upcoming turn ends when watching conversational turns switch between others by analysing natural conversations. These conversations were between triads of older adults in different levels and types of noise. The analysis focused on the observer during turn switches between the other two parties using head orientation (i.e. saccades from one talker to the next) to identify when their focus moved from one talker to the next. For non-overlapping utterances, observers started to turn to the upcoming talker before the prior talker had finished speaking in 17% of turn switches (going up to 26% when accounting for motor-planning time). For overlapping utterances, observers started to turn towards the interrupter before they interrupted in 18% of turn switches (going up to 33% when accounting for motor-planning time). The timing of head turns was more precise at lower than higher noise levels, and was not affected by noise type. These findings demonstrate that listeners in natural group conversation situations often exhibit head movements that anticipate the end of one conversational turn and the beginning of another. Furthermore, this work demonstrates the value of analysing head movement as a cue to social attention, which could be relevant for advancing communication technology such as hearing devices.
Collapse
Affiliation(s)
- Lauren V. Hadley
- Hearing Sciences – Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| | - John F. Culling
- School of Psychology, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
20
|
Choi JH, Choi HJ, Kim DH, Park JH, An YH, Shim HJ. Effect of face masks on speech perception in noise of individuals with hearing aids. Front Neurosci 2022; 16:1036767. [PMID: 36532290 PMCID: PMC9754666 DOI: 10.3389/fnins.2022.1036767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/11/2022] [Indexed: 11/03/2023] Open
Abstract
Although several previous studies have confirmed that listeners find it difficult to perceive the speech of face-mask-wearing speakers, there has been little research into how masks affect hearing-impaired individuals using hearing aids. Therefore, the aim of this study was to compare the effects of masks on the speech perception in noise of hearing-impaired individuals and normal-hearing individuals. We also investigated the effect of masks on the gain conferred by hearing aids. The hearing-impaired group included 24 listeners (age: M = 69.5, SD = 8.6; M:F = 13:11) who had used hearing aids in everyday life for >1 month (M = 20.7, SD = 24.0) and the normal-hearing group included 26 listeners (age: M = 57.9, SD = 11.1; M:F = 13:13). Speech perception in noise was measured under no mask-auditory-only (no-mask-AO), no mask-auditory-visual (no-mask-AV), and mask-AV conditions at five signal-to-noise ratios (SNRs; -16, -12, -8, -4, 0 dB) using five lists of 25 monosyllabic Korean words. Video clips that included a female speaker's face and sound or the sound only were presented through a monitor and a loudspeaker located 1 m in front of the listener in a sound-attenuating booth. The degree of deterioration in speech perception caused by the mask (no-mask-AV minus mask-AV) was significantly greater for hearing-impaired vs. normal-hearing participants only at 0 dB SNR (Bonferroni's corrected p < 0.01). When the effects of a mask on speech perception, with and without hearing aids, were compared in the hearing-impaired group, the degree of deterioration in speech perception caused by the mask was significantly reduced by the hearing aids compared with that without hearing aids at 0 and -4 dB SNR (Bonferroni's corrected p < 0.01). The improvement conferred by hearing aids (unaided speech perception score minus aided speech perception score) was significantly greater at 0 and -4 dB SNR than at -16 dB SNR in the mask-AV group (Bonferroni's corrected p < 0.01). These results demonstrate that hearing aids still improve speech perception when the speaker is masked, and that hearing aids partly offset the effect of a mask at relatively low noise levels.
Collapse
Affiliation(s)
| | | | | | | | | | - Hyun Joon Shim
- Department of Otorhinolaryngology-Head and Neck Surgery, Nowon Eulji Medical Center, Eulji University School of Medicine, Seoul, South Korea
| |
Collapse
|
21
|
Xiu B, Paul BT, Chen JM, Le TN, Lin VY, Dimitrijevic A. Neural responses to naturalistic audiovisual speech are related to listening demand in cochlear implant users. Front Hum Neurosci 2022; 16:1043499. [DOI: 10.3389/fnhum.2022.1043499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
There is a weak relationship between clinical and self-reported speech perception outcomes in cochlear implant (CI) listeners. Such poor correspondence may be due to differences in clinical and “real-world” listening environments and stimuli. Speech in the real world is often accompanied by visual cues, background environmental noise, and is generally in a conversational context, all factors that could affect listening demand. Thus, our objectives were to determine if brain responses to naturalistic speech could index speech perception and listening demand in CI users. Accordingly, we recorded high-density electroencephalogram (EEG) while CI users listened/watched a naturalistic stimulus (i.e., the television show, “The Office”). We used continuous EEG to quantify “speech neural tracking” (i.e., TRFs, temporal response functions) to the show’s soundtrack and 8–12 Hz (alpha) brain rhythms commonly related to listening effort. Background noise at three different signal-to-noise ratios (SNRs), +5, +10, and +15 dB were presented to vary the difficulty of following the television show, mimicking a natural noisy environment. The task also included an audio-only (no video) condition. After each condition, participants subjectively rated listening demand and the degree of words and conversations they felt they understood. Fifteen CI users reported progressively higher degrees of listening demand and less words and conversation with increasing background noise. Listening demand and conversation understanding in the audio-only condition was comparable to that of the highest noise condition (+5 dB). Increasing background noise affected speech neural tracking at a group level, in addition to eliciting strong individual differences. Mixed effect modeling showed that listening demand and conversation understanding were correlated to early cortical speech tracking, such that high demand and low conversation understanding occurred with lower amplitude TRFs. In the high noise condition, greater listening demand was negatively correlated to parietal alpha power, where higher demand was related to lower alpha power. No significant correlations were observed between TRF/alpha and clinical speech perception scores. These results are similar to previous findings showing little relationship between clinical speech perception and quality-of-life in CI users. However, physiological responses to complex natural speech may provide an objective measure of aspects of quality-of-life measures like self-perceived listening demand.
Collapse
|
22
|
Cross-Modal Reorganization From Both Visual and Somatosensory Modalities in Cochlear Implanted Children and Its Relationship to Speech Perception. Otol Neurotol 2022; 43:e872-e879. [PMID: 35970165 DOI: 10.1097/mao.0000000000003619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
HYPOTHESIS We hypothesized that children with cochlear implants (CIs) who demonstrate cross-modal reorganization by vision also demonstrate cross-modal reorganization by somatosensation and that these processes are interrelated and impact speech perception. BACKGROUND Cross-modal reorganization, which occurs when a deprived sensory modality's cortical resources are recruited by other intact modalities, has been proposed as a source of variability underlying speech perception in deaf children with CIs. Visual and somatosensory cross-modal reorganization of auditory cortex have been documented separately in CI children, but reorganization in these modalities has not been documented within the same subjects. Our goal was to examine the relationship between cross-modal reorganization from both visual and somatosensory modalities within a single group of CI children. METHODS We analyzed high-density electroencephalogram responses to visual and somatosensory stimuli and current density reconstruction of brain activity sources. Speech perception in noise testing was performed. Current density reconstruction patterns were analyzed within the entire subject group and across groups of CI children exhibiting good versus poor speech perception. RESULTS Positive correlations between visual and somatosensory cross-modal reorganization suggested that neuroplasticity in different sensory systems may be interrelated. Furthermore, CI children with good speech perception did not show recruitment of frontal or auditory cortices during visual processing, unlike CI children with poor speech perception. CONCLUSION Our results reflect changes in cortical resource allocation in pediatric CI users. Cross-modal recruitment of auditory and frontal cortices by vision, and cross-modal reorganization of auditory cortex by somatosensation, may underlie variability in speech and language outcomes in CI children.
Collapse
|
23
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
24
|
Haider CL, Suess N, Hauswald A, Park H, Weisz N. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker. Neuroimage 2022; 252:119044. [PMID: 35240298 DOI: 10.1016/j.neuroimage.2022.119044] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/29/2022] Open
Abstract
Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Collapse
Affiliation(s)
- Chandra Leon Haider
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria.
| | - Nina Suess
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Hyojin Park
- School of Psychology & Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, UK
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
25
|
Asymmetrical cross-modal influence on neural encoding of auditory and visual features in natural scenes. Neuroimage 2022; 255:119182. [PMID: 35395403 DOI: 10.1016/j.neuroimage.2022.119182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/22/2022] Open
Abstract
Natural scenes contain multi-modal information, which is integrated to form a coherent perception. Previous studies have demonstrated that cross-modal information can modulate neural encoding of low-level sensory features. These studies, however, mostly focus on the processing of single sensory events or rhythmic sensory sequences. Here, we investigate how the neural encoding of basic auditory and visual features is modulated by cross-modal information when the participants watch movie clips primarily composed of non-rhythmic events. We presented audiovisual congruent and audiovisual incongruent movie clips, and since attention can modulate cross-modal interactions, we separately analyzed high- and low-arousal movie clips. We recorded neural responses using electroencephalography (EEG), and employed the temporal response function (TRF) to quantify the neural encoding of auditory and visual features. The neural encoding of sound envelope is enhanced in the audiovisual congruent condition than the incongruent condition, but this effect is only significant for high-arousal movie clips. In contrast, audiovisual congruency does not significantly modulate the neural encoding of visual features, e.g., luminance or visual motion. In summary, our findings demonstrate asymmetrical cross-modal interactions during the processing of natural scenes that lack rhythmicity: Congruent visual information enhances low-level auditory processing, while congruent auditory information does not significantly modulate low-level visual processing.
Collapse
|
26
|
Varano E, Vougioukas K, Ma P, Petridis S, Pantic M, Reichenbach T. Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans. Front Neurosci 2022; 15:781196. [PMID: 35069100 PMCID: PMC8766421 DOI: 10.3389/fnins.2021.781196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/29/2021] [Indexed: 12/02/2022] Open
Abstract
Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.
Collapse
Affiliation(s)
- Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, London, United Kingdom
| | | | - Pingchuan Ma
- Department of Computing, Imperial College London, London, United Kingdom
| | - Stavros Petridis
- Department of Computing, Imperial College London, London, United Kingdom
| | - Maja Pantic
- Department of Computing, Imperial College London, London, United Kingdom
| | - Tobias Reichenbach
- Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
| |
Collapse
|
27
|
Palana J, Schwartz S, Tager-Flusberg H. Evaluating the Use of Cortical Entrainment to Measure Atypical Speech Processing: A Systematic Review. Neurosci Biobehav Rev 2021; 133:104506. [PMID: 34942267 DOI: 10.1016/j.neubiorev.2021.12.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 12/12/2021] [Accepted: 12/18/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Cortical entrainment has emerged as promising means for measuring continuous speech processing in young, neurotypical adults. However, its utility for capturing atypical speech processing has not been systematically reviewed. OBJECTIVES Synthesize evidence regarding the merit of measuring cortical entrainment to capture atypical speech processing and recommend avenues for future research. METHOD We systematically reviewed publications investigating entrainment to continuous speech in populations with auditory processing differences. RESULTS In the 25 publications reviewed, most studies were conducted on older and/or hearing-impaired adults, for whom slow-wave entrainment to speech was often heightened compared to controls. Research conducted on populations with neurodevelopmental disorders, in whom slow-wave entrainment was often reduced, was less common. Across publications, findings highlighted associations between cortical entrainment and speech processing performance differences. CONCLUSIONS Measures of cortical entrainment offer useful means of capturing speech processing differences and future research should leverage them more extensively when studying populations with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Joseph Palana
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA; Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Harvard Medical School, Boston Children's Hospital, 1 Autumn Street, Boston, MA, 02215, USA
| | - Sophie Schwartz
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
28
|
Straetmans L, Holtze B, Debener S, Jaeger M, Mirkovic B. Neural tracking to go: auditory attention decoding and saliency detection with mobile EEG. J Neural Eng 2021; 18. [PMID: 34902846 DOI: 10.1088/1741-2552/ac42b5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/13/2021] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. APPROACH Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. MAIN RESULTS The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. CONCLUSION Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.
Collapse
Affiliation(s)
- Lisa Straetmans
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstraße 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - B Holtze
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Stefan Debener
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Manuela Jaeger
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Bojana Mirkovic
- Department of Psychology , Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| |
Collapse
|
29
|
Rosemann S, Gieseler A, Tahden M, Colonius H, Thiel CM. Treatment of Age-Related Hearing Loss Alters Audiovisual Integration and Resting-State Functional Connectivity: A Randomized Controlled Pilot Trial. eNeuro 2021; 8:ENEURO.0258-21.2021. [PMID: 34759049 PMCID: PMC8658542 DOI: 10.1523/eneuro.0258-21.2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/23/2021] [Accepted: 10/14/2021] [Indexed: 11/21/2022] Open
Abstract
Untreated age-related hearing loss increases audiovisual integration and impacts resting state functional brain connectivity. Further, there is a relation between crossmodal plasticity and audiovisual integration strength in cochlear implant patients. However, it is currently unclear whether amplification of the auditory input by hearing aids influences audiovisual integration and resting state functional brain connectivity. We conducted a randomized controlled pilot study to investigate how the McGurk illusion, a common measure for audiovisual integration, and resting state functional brain connectivity of the auditory cortex are altered by six-month hearing aid use. Thirty-two older participants with slight-to-moderate, symmetric, age-related hearing loss were allocated to a treatment or waiting control group and measured one week before and six months after hearing aid fitting with functional magnetic resonance imaging. Our results showed a statistical trend for an increased McGurk illusion after six months of hearing aid use. We further demonstrated that an increase in McGurk susceptibility is related to a decreased hearing aid benefit for auditory speech intelligibility in noise. No significant interaction between group and time point was obtained in the whole-brain resting state analysis. However, a region of interest (ROI)-to-ROI analysis indicated that hearing aid use of six months was associated with a decrease in resting state functional connectivity between the auditory cortex and the fusiform gyrus and that this decrease was related to an increase of perceived McGurk illusions. Our study, therefore, suggests that even short-term hearing aid use alters audiovisual integration and functional brain connectivity between auditory and visual cortices.
Collapse
Affiliation(s)
- Stephanie Rosemann
- Biological Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
- Cluster of Excellence "Hearing4all," Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
| | - Anja Gieseler
- Cluster of Excellence "Hearing4all," Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
- Cognitive Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Oldenburg 26111 Universität Oldenburg, Oldenburg 26111, Germany
| | - Maike Tahden
- Cluster of Excellence "Hearing4all," Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
- Cognitive Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Oldenburg 26111 Universität Oldenburg, Oldenburg 26111, Germany
| | - Hans Colonius
- Cluster of Excellence "Hearing4all," Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
- Cognitive Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Oldenburg 26111 Universität Oldenburg, Oldenburg 26111, Germany
| | - Christiane M Thiel
- Biological Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
- Cluster of Excellence "Hearing4all," Carl von Ossietzky Universität Oldenburg, Oldenburg 26111, Germany
| |
Collapse
|
30
|
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci 2021; 41:8946-8962. [PMID: 34503996 DOI: 10.1523/jneurosci.2891-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 11/21/2022] Open
Abstract
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Collapse
|
31
|
Tremblay P, Basirat A, Pinto S, Sato M. Visual prediction cues can facilitate behavioural and neural speech processing in young and older adults. Neuropsychologia 2021; 159:107949. [PMID: 34228997 DOI: 10.1016/j.neuropsychologia.2021.107949] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 06/16/2021] [Accepted: 07/01/2021] [Indexed: 02/06/2023]
Abstract
The ability to process speech evolves over the course of the lifespan. Understanding speech at low acoustic intensity and in the presence of background noise becomes harder, and the ability for older adults to benefit from audiovisual speech also appears to decline. These difficulties can have important consequences on quality of life. Yet, a consensus on the cause of these difficulties is still lacking. The objective of this study was to examine the processing of speech in young and older adults under different modalities (i.e. auditory [A], visual [V], audiovisual [AV]) and in the presence of different visual prediction cues (i.e., no predictive cue (control), temporal predictive cue, phonetic predictive cue, and combined temporal and phonetic predictive cues). We focused on recognition accuracy and four auditory evoked potential (AEP) components: P1-N1-P2 and N2. Thirty-four right-handed French-speaking adults were recruited, including 17 younger adults (28 ± 2 years; 20-42 years) and 17 older adults (67 ± 3.77 years; 60-73 years). Participants completed a forced-choice speech identification task. The main findings of the study are: (1) The faciliatory effect of visual information was reduced, but present, in older compared to younger adults, (2) visual predictive cues facilitated speech recognition in younger and older adults alike, (3) age differences in AEPs were localized to later components (P2 and N2), suggesting that aging predominantly affects higher-order cortical processes related to speech processing rather than lower-level auditory processes. (4) Specifically, AV facilitation on P2 amplitude was lower in older adults, there was a reduced effect of the temporal predictive cue on N2 amplitude for older compared to younger adults, and P2 and N2 latencies were longer for older adults. Finally (5) behavioural performance was associated with P2 amplitude in older adults. Our results indicate that aging affects speech processing at multiple levels, including audiovisual integration (P2) and auditory attentional processes (N2). These findings have important implications for understanding barriers to communication in older ages, as well as for the development of compensation strategies for those with speech processing difficulties.
Collapse
Affiliation(s)
- Pascale Tremblay
- Département de Réadaptation, Faculté de Médecine, Université Laval, Quebec City, Canada; Cervo Brain Research Centre, Quebec City, Canada.
| | - Anahita Basirat
- Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France
| | - Serge Pinto
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| | - Marc Sato
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
32
|
MEG Intersubject Phase Locking of Stimulus-Driven Activity during Naturalistic Speech Listening Correlates with Musical Training. J Neurosci 2021; 41:2713-2722. [PMID: 33536196 DOI: 10.1523/jneurosci.0932-20.2020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 11/13/2020] [Accepted: 11/17/2020] [Indexed: 12/26/2022] Open
Abstract
Musical training is associated with increased structural and functional connectivity between auditory sensory areas and higher-order brain networks involved in speech and motor processing. Whether such changed connectivity patterns facilitate the cortical propagation of speech information in musicians remains poorly understood. We here used magnetoencephalography (MEG) source imaging and a novel seed-based intersubject phase-locking approach to investigate the effects of musical training on the interregional synchronization of stimulus-driven neural responses during listening to naturalistic continuous speech presented in silence. MEG data were obtained from 20 young human subjects (both sexes) with different degrees of musical training. Our data show robust bilateral patterns of stimulus-driven interregional phase synchronization between auditory cortex and frontotemporal brain regions previously associated with speech processing. Stimulus-driven phase locking was maximal in the delta band, but was also observed in the theta and alpha bands. The individual duration of musical training was positively associated with the magnitude of stimulus-driven alpha-band phase locking between auditory cortex and parts of the dorsal and ventral auditory processing streams. These findings provide evidence for a positive relationship between musical training and the propagation of speech-related information between auditory sensory areas and higher-order processing networks, even when speech is presented in silence. We suggest that the increased synchronization of higher-order cortical regions to auditory cortex may contribute to the previously described musician advantage in processing speech in background noise.SIGNIFICANCE STATEMENT Musical training has been associated with widespread structural and functional brain plasticity. It has been suggested that these changes benefit the production and perception of music but can also translate to other domains of auditory processing, such as speech. We developed a new magnetoencephalography intersubject analysis approach to study the cortical synchronization of stimulus-driven neural responses during the perception of continuous natural speech and its relationship to individual musical training. Our results provide evidence that musical training is associated with higher synchronization of stimulus-driven activity between brain regions involved in early auditory sensory and higher-order processing. We suggest that the increased synchronized propagation of speech information may contribute to the previously described musician advantage in processing speech in background noise.
Collapse
|
33
|
Bell L, Peng ZE, Pausch F, Reindl V, Neuschaefer-Rube C, Fels J, Konrad K. fNIRS Assessment of Speech Comprehension in Children with Normal Hearing and Children with Hearing Aids in Virtual Acoustic Environments: Pilot Data and Practical Recommendations. CHILDREN (BASEL, SWITZERLAND) 2020; 7:E219. [PMID: 33171753 PMCID: PMC7695031 DOI: 10.3390/children7110219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/02/2020] [Accepted: 11/05/2020] [Indexed: 11/16/2022]
Abstract
The integration of virtual acoustic environments (VAEs) with functional near-infrared spectroscopy (fNIRS) offers novel avenues to investigate behavioral and neural processes of speech-in-noise (SIN) comprehension in complex auditory scenes. Particularly in children with hearing aids (HAs), the combined application might offer new insights into the neural mechanism of SIN perception in simulated real-life acoustic scenarios. Here, we present first pilot data from six children with normal hearing (NH) and three children with bilateral HAs to explore the potential applicability of this novel approach. Children with NH received a speech recognition benefit from low room reverberation and target-distractors' spatial separation, particularly when the pitch of the target and the distractors was similar. On the neural level, the left inferior frontal gyrus appeared to support SIN comprehension during effortful listening. Children with HAs showed decreased SIN perception across conditions. The VAE-fNIRS approach is critically compared to traditional SIN assessments. Although the current study shows that feasibility still needs to be improved, the combined application potentially offers a promising tool to investigate novel research questions in simulated real-life listening. Future modified VAE-fNIRS applications are warranted to replicate the current findings and to validate its application in research and clinical settings.
Collapse
Affiliation(s)
- Laura Bell
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, 52074 Aachen, Germany; (V.R.); (K.K.)
| | - Z. Ellen Peng
- Teaching and Research Area of Medical Acoustics, Institute of Technical Acoustics, RWTH Aachen University, 52074 Aachen, Germany; (F.P.); (J.F.)
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA;
| | - Florian Pausch
- Teaching and Research Area of Medical Acoustics, Institute of Technical Acoustics, RWTH Aachen University, 52074 Aachen, Germany; (F.P.); (J.F.)
| | - Vanessa Reindl
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, 52074 Aachen, Germany; (V.R.); (K.K.)
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen & Research Centre Juelich, 52428 Juelich, Germany
| | - Christiane Neuschaefer-Rube
- Clinic of Phoniatrics, Pedaudiology, and Communication Disorders, Medical Faculty, RWTH Aachen University, 52074 Aachen, Germany;
| | - Janina Fels
- Teaching and Research Area of Medical Acoustics, Institute of Technical Acoustics, RWTH Aachen University, 52074 Aachen, Germany; (F.P.); (J.F.)
| | - Kerstin Konrad
- Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Medical Faculty, RWTH Aachen University, 52074 Aachen, Germany; (V.R.); (K.K.)
- JARA-Brain Institute II, Molecular Neuroscience and Neuroimaging, RWTH Aachen & Research Centre Juelich, 52428 Juelich, Germany
| |
Collapse
|
34
|
Helfer KS, Jesse A. Hearing and speech processing in midlife. Hear Res 2020; 402:108097. [PMID: 33706999 DOI: 10.1016/j.heares.2020.108097] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 09/29/2020] [Accepted: 10/13/2020] [Indexed: 12/20/2022]
Abstract
Middle-aged adults often report a decline in their ability to understand speech in adverse listening situations. However, there has been relatively little research devoted to identifying how early aging affects speech processing, as the majority of investigations into senescent changes in speech understanding compare performance in groups of young and older adults. This paper provides an overview of research on hearing and speech perception in middle-aged adults. Topics covered include both objective and subjective (self-perceived) hearing and speech understanding, listening effort, and audiovisual speech perception. This review ends with justification for future research needed to define the nature, consequences, and remediation of hearing problems in middle-aged adults.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 N. Pleasant St., Amherst, MA 01003, USA.
| | - Alexandra Jesse
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, MA 01003, USA.
| |
Collapse
|
35
|
Campbell J, Sharma A. Frontal Cortical Modulation of Temporal Visual Cross-Modal Re-organization in Adults with Hearing Loss. Brain Sci 2020; 10:brainsci10080498. [PMID: 32751543 PMCID: PMC7465622 DOI: 10.3390/brainsci10080498] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 07/24/2020] [Accepted: 07/27/2020] [Indexed: 11/19/2022] Open
Abstract
Recent research has demonstrated frontal cortical involvement to co-occur with visual re-organization, suggestive of top-down modulation of cross-modal mechanisms. However, it is unclear whether top-down modulation of visual re-organization takes place in mild hearing loss, or is dependent upon greater degrees of hearing loss severity. Thus, the purpose of this study was to determine if frontal top-down modulation of visual cross-modal re-organization increased across hearing loss severity. We recorded visual evoked potentials (VEPs) in response to apparent motion stimuli in 17 adults with mild-moderate hearing loss using 128-channel high-density electroencephalography (EEG). Current density reconstructions (CDRs) were generated using sLORETA to visualize VEP generators in both groups. VEP latency and amplitude in frontal regions of interest (ROIs) were compared between groups and correlated with auditory behavioral measures. Activation of frontal networks in response to visual stimulation increased across mild to moderate hearing loss, with simultaneous activation of the temporal cortex. In addition, group differences in VEP latency and amplitude correlated with auditory behavioral measures. Overall, these findings support the hypothesis that frontal top-down modulation of visual cross-modal re-organization is dependent upon hearing loss severity.
Collapse
Affiliation(s)
- Julia Campbell
- Central Sensory Processes Laboratory, Department of Communication Sciences and Disorders, University of Texas at Austin, 2504 Whitis Ave a1100, Austin, TX 78712, USA;
| | - Anu Sharma
- Anu Sharma, Brain and Behavior Laboratory, Institute of Cognitive Science, Department of Speech, Language and Hearing Science, University of Colorado at Boulder, 409 UCB, 2501 Kittredge Loop Drive, Boulder, CO 80309, USA
- Correspondence:
| |
Collapse
|
36
|
Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020; 117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.
Collapse
Affiliation(s)
- John Plass
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109;
- Department of Psychology, Northwestern University, Evanston, IL 60208
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| |
Collapse
|
37
|
Greenlaw KM, Puschmann S, Coffey EBJ. Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:268-287. [PMID: 37215227 PMCID: PMC10158587 DOI: 10.1162/nol_a_00013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/25/2020] [Indexed: 05/24/2023]
Abstract
Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound's amplitude envelope (i.e., syllabic rate or rhythm; 1-9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain's response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.
Collapse
Affiliation(s)
- Keelin M. Greenlaw
- Department of Psychology, Concordia University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS)
- The Centre for Research on Brain, Language and Music (CRBLM)
| | | | | |
Collapse
|
38
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
39
|
Age-related hearing loss influences functional connectivity of auditory cortex for the McGurk illusion. Cortex 2020; 129:266-280. [PMID: 32535378 DOI: 10.1016/j.cortex.2020.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/30/2020] [Accepted: 04/09/2020] [Indexed: 01/23/2023]
Abstract
Age-related hearing loss affects hearing at high frequencies and is associated with difficulties in understanding speech. Increased audio-visual integration has recently been found in age-related hearing impairment, the brain mechanisms that contribute to this effect are however unclear. We used functional magnetic resonance imaging in elderly subjects with normal hearing and mild to moderate uncompensated hearing loss. Audio-visual integration was studied using the McGurk task. In this task, an illusionary fused percept can occur if incongruent auditory and visual syllables are presented. The paradigm included unisensory stimuli (auditory only, visual only), congruent audio-visual and incongruent (McGurk) audio-visual stimuli. An illusionary precept was reported in over 60% of incongruent trials. These McGurk illusion rates were equal in both groups of elderly subjects and correlated positively with speech-in-noise perception and daily listening effort. Normal-hearing participants showed an increased neural response in left pre- and postcentral gyri and right middle frontal gyrus for incongruent stimuli (McGurk) compared to congruent audio-visual stimuli. Activation patterns were however not different between groups. Task-modulated functional connectivity differed between groups showing increased connectivity from auditory cortex to visual, parietal and frontal areas in hard of hearing participants as compared to normal-hearing participants when comparing incongruent stimuli (McGurk) with congruent audio-visual stimuli. These results suggest that changes in functional connectivity of auditory cortex rather than activation strength during processing of audio-visual McGurk stimuli accompany age-related hearing loss.
Collapse
|
40
|
Chen WC, Xiong LM, Gao L, Cheng Q. [Current status of initial diagnosis of speech sound disorder in a child healthcare clinic]. ZHONGGUO DANG DAI ER KE ZA ZHI = CHINESE JOURNAL OF CONTEMPORARY PEDIATRICS 2020; 22:499-504. [PMID: 32434648 PMCID: PMC7389400 DOI: 10.7499/j.issn.1008-8830.1911106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 03/26/2020] [Indexed: 06/11/2023]
Abstract
OBJECTIVE To investigate the understanding of speech sound disorder (SSD) among child health practitioners. METHODS The clinical data of 506 children with an initial diagnosis of SSD from January 2017 to May 2019 were retrospectively analyzed. RESULTS Of the 506 SSD children, 90.5% had a description of developmental behavior in their medical records; 97.6% received a developmental-behavioral evaluation, mostly intellectual and developmental screening tests, which were given to 95.8% (485/506) of the total children. A total of 116 (22.9%) children also had neurodevelopmental disorders, commonly presenting with language disorder, global developmental delay, and intellectual disability; however, 53 (45.7%) of the 116 children had no history records of such abnormal developmental behavior. The incidence of neurodevelopmental disorders was significantly higher in the children with abnormal hearing reported by their families than in the children with normal hearing reported by their families (P<0.001). The children with abnormal response to sound stimulation on physical examination had significantly more frequent neurodevelopmental disorders than those with normal response to sound stimulation (P<0.05). Among the 506 children with SSD, hearing condition was ignored in 33.2% in history records, and in 31.2% on physical examination. Ninety-two children (18.2%) completed the diagnostic hearing test, 12% (11/92) of whom were diagnosed with hearing loss. Of the 11 children with hearing loss, three had passed a hearing screening, three had family-reported normal hearing, and seven had normal response to sound stimulation on physical examination. CONCLUSIONS SSD is frequently comorbid with neurodevelopmental disorders in children. Children's communication performance is a key to the diagnosis of neurodevelop-mental disorders. It's necessary to the diagnosis of SSD to perform a medical history collection about neuropsychological development and a developmental-behavior evaluation. There is a high proportion of children with SSD receiving the developmental-behavioral evaluation, suggesting that child health practitioners pay close attention to the neuropsychological development of SSD children, but mostly, the evaluation merely involves intellectual developmental screening tests. The detection rate of hearing loss in children with SSD is high. However, child health practitioners underestimate this problem, and have an insufficient understanding of the importance of the diagnostic hearing test. The diagnostic hearing test should be the preferred recommendation for assessing hearing ability rather than past hearing screening results or children's response to sound stimulation in life scenes.
Collapse
Affiliation(s)
- Wen-Cong Chen
- Department of Child Health Care, Children's Hospital of Chongqing Medical University/Ministry of Education Key Laboratory of Child Development and Disorders/National Clinical Research Center for Child Health and Disorders/China International Science and Technology Cooperation Base of Child Development and Critical Disorders/Chongqing Key Laboratory of Translational Medical Research in Cognitive Development and Learning and Memory Disorders, Chongqing 400014, China.
| | | | | | | |
Collapse
|
41
|
Paul BT, Uzelac M, Chan E, Dimitrijevic A. Poor early cortical differentiation of speech predicts perceptual difficulties of severely hearing-impaired listeners in multi-talker environments. Sci Rep 2020; 10:6141. [PMID: 32273536 PMCID: PMC7145807 DOI: 10.1038/s41598-020-63103-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/24/2020] [Indexed: 11/23/2022] Open
Abstract
Hearing impairment disrupts processes of selective attention that help listeners attend to one sound source over competing sounds in the environment. Hearing prostheses (hearing aids and cochlear implants, CIs), do not fully remedy these issues. In normal hearing, mechanisms of selective attention arise through the facilitation and suppression of neural activity that represents sound sources. However, it is unclear how hearing impairment affects these neural processes, which is key to understanding why listening difficulty remains. Here, severely-impaired listeners treated with a CI, and age-matched normal-hearing controls, attended to one of two identical but spatially separated talkers while multichannel EEG was recorded. Whereas neural representations of attended and ignored speech were differentiated at early (~ 150 ms) cortical processing stages in controls, differentiation of talker representations only occurred later (~250 ms) in CI users. CI users, but not controls, also showed evidence for spatial suppression of the ignored talker through lateralized alpha (7-14 Hz) oscillations. However, CI users' perceptual performance was only predicted by early-stage talker differentiation. We conclude that multi-talker listening difficulty remains for impaired listeners due to deficits in early-stage separation of cortical speech representations, despite neural evidence that they use spatial information to guide selective attention.
Collapse
Affiliation(s)
- Brandon T Paul
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
| | - Mila Uzelac
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Emmanuel Chan
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Andrew Dimitrijevic
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
- Faculty of Medicine, Otolaryngology-Head and Neck Surgery, University of Toronto, Toronto, ON, M5S 1A1, Canada.
| |
Collapse
|
42
|
Fu Z, Wu X, Chen J. Congruent audiovisual speech enhances auditory attention decoding with EEG. J Neural Eng 2019; 16:066033. [PMID: 31505476 DOI: 10.1088/1741-2552/ab4340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners. APPROACH In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. MAIN RESULTS Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. SIGNIFICANCE The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
Collapse
Affiliation(s)
- Zhen Fu
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China
| | | | | |
Collapse
|