1
|
Maguinness C, Schall S, Mathias B, Schoemann M, von Kriegstein K. Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise. Q J Exp Psychol (Hove) 2024:17470218241278649. [PMID: 39164830 DOI: 10.1177/17470218241278649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Seeing the visual articulatory movements of a speaker, while hearing their voice, helps with understanding what is said. This multisensory enhancement is particularly evident in noisy listening conditions. Multisensory enhancement also occurs even in auditory-only conditions: auditory-only speech and voice-identity recognition are superior for speakers previously learned with their face, compared to control learning; an effect termed the "face-benefit." Whether the face-benefit can assist in maintaining robust perception in increasingly noisy listening conditions, similar to concurrent multisensory input, is unknown. Here, in two behavioural experiments, we examined this hypothesis. In each experiment, participants learned a series of speakers' voices together with their dynamic face or control image. Following learning, participants listened to auditory-only sentences spoken by the same speakers and recognised the content of the sentences (speech recognition, Experiment 1) or the voice-identity of the speaker (Experiment 2) in increasing levels of auditory noise. For speech recognition, we observed that 14 of 30 participants (47%) showed a face-benefit. 19 of 25 participants (76%) showed a face-benefit for voice-identity recognition. For those participants who demonstrated a face-benefit, the face-benefit increased with auditory noise levels. Taken together, the results support an audio-visual model of auditory communication and suggest that the brain can develop a flexible system in which learned facial characteristics are used to deal with varying auditory uncertainty.
Collapse
Affiliation(s)
- Corrina Maguinness
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Sonja Schall
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Brian Mathias
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- School of Psychology, University of Aberdeen, Aberdeen, United Kingdom
| | - Martin Schoemann
- Chair of Psychological Methods and Cognitive Modelling, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
2
|
Gómez-Vicente V, Esquiva G, Lancho C, Benzerdjeb K, Jerez AA, Ausó E. Importance of Visual Support Through Lipreading in the Identification of Words in Spanish Language. LANGUAGE AND SPEECH 2024:238309241270741. [PMID: 39189455 DOI: 10.1177/00238309241270741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
We sought to examine the contribution of visual cues, such as lipreading, in the identification of familiar (words) and unfamiliar (phonemes) words in terms of percent accuracy. For that purpose, in this retrospective study, we presented lists of words and phonemes (adult female healthy voice) in auditory (A) and audiovisual (AV) modalities to 65 Spanish normal-hearing male and female listeners classified in four age groups. Our results showed a remarkable benefit of AV information in word and phoneme recognition. Regarding gender, women exhibited better performance than men in both A and AV modalities, although we only found significant differences for words but not for phonemes. Concerning age, significant differences were detected in word recognition in the A modality between the youngest (18-29 years old) and oldest (⩾50 years old) groups only. We conclude visual information enhances word and phoneme recognition and women are more influenced by visual signals than men in AV speech perception. On the contrary, it seems that, overall, age is not a limiting factor for word recognition, with no significant differences observed in the AV modality.
Collapse
Affiliation(s)
| | - Gema Esquiva
- Department of Optics, Pharmacology and Anatomy, University of Alicante, Spain; Alicante Institute for Health and Biomedical Research (ISABIAL), Spain
| | - Carmen Lancho
- Data Science Laboratory, University Rey Juan Carlos, Spain
| | | | | | - Eva Ausó
- Department of Optics, Pharmacology and Anatomy, University of Alicante, Spain
| |
Collapse
|
3
|
Li S, Wang Y, Yu Q, Feng Y, Tang P. The Effect of Visual Articulatory Cues on the Identification of Mandarin Tones by Children With Cochlear Implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:2106-2114. [PMID: 38768072 DOI: 10.1044/2024_jslhr-23-00559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
PURPOSE This study explored the facilitatory effect of visual articulatory cues on the identification of Mandarin lexical tones by children with cochlear implants (CIs) in both quiet and noisy environments. It also explored whether early implantation is associated with better use of visual cues in tonal identification. METHOD Participants included 106 children with CIs and 100 normal-hearing (NH) controls. A tonal identification task was employed using a two-alternative forced-choice picture-pointing paradigm. Participants' tonal identification accuracies were compared between audio-only (AO) and audiovisual (AV) modalities. Correlations between implantation ages and visual benefits (accuracy differences between AO and AV modalities) were also examined. RESULTS Children with CIs demonstrated an improved identification accuracy from AO to AV modalities in the noisy environment. Additionally, earlier implantation was significantly correlated with a greater visual benefit in noise. CONCLUSIONS These findings indicated that children with CIs benefited from visual cues on tonal identification in noise, and early implantation enhanced the visual benefit. These results thus have practical implications on tonal perception interventions for Mandarin-speaking children with CIs.
Collapse
Affiliation(s)
- Shanpeng Li
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Yinuo Wang
- Department of English, Linguistics and Theatre Studies, Faculty of Arts & Social Sciences, National University of Singapore
| | - Qianxi Yu
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Yan Feng
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Ping Tang
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| |
Collapse
|
4
|
Moberly AC, Pisoni DB, Tamati TN. Audiovisual Processing Skills Before Cochlear Implantation Predict Postoperative Speech Recognition in Adults. Ear Hear 2024; 45:617-625. [PMID: 38143302 PMCID: PMC11025067 DOI: 10.1097/aud.0000000000001450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
OBJECTIVES Adults with hearing loss (HL) demonstrate greater benefits of adding visual cues to auditory cues (i.e., "visual enhancement" [VE]) during recognition of speech presented in a combined audiovisual (AV) fashion when compared with normal-hearing peers. For patients with moderate-to-profound sensorineural HL who receive cochlear implants (CIs), it is unclear whether the restoration of audibility results in a decrease in the VE provided by visual cues during AV speech recognition. Moreover, it is unclear whether increased VE during the experience of HL before CI is beneficial or maladaptive to ultimate speech recognition abilities after implantation. It is conceivable that greater VE before implantation contributes to the enormous variability in speech recognition outcomes demonstrated among patients with CIs. This study took a longitudinal approach to test two hypotheses: (H1) Adult listeners with HL who receive CIs would demonstrate a decrease in VE after implantation; and (H2) The magnitude of pre-CI VE would predict post-CI auditory-only speech recognition abilities 6 months after implantation, with the direction of that relation supporting a beneficial, redundant, or maladaptive effect on outcomes. DESIGN Data were collected from 30 adults at two time points: immediately before CI surgery and 6 months after device activation. Pre-CI speech recognition performance was measured in auditory-only (A-only), visual-only, and combined AV fashion for City University of New York (CUNY) sentences. Scores of VE during AV sentence recognition were computed. At 6 months after CI activation, participants were again tested on CUNY sentence recognition in the same conditions as pre-CI. H1 was tested by comparing post- versus pre-CI VE scores. At 6 months of CI use, additional open-set speech recognition measures were also obtained in the A-only condition, including isolated words, words in meaningful AzBio sentences, and words in AzBio sentences in multitalker babble. To test H2, correlation analyses were performed to assess the relation between post-CI A-only speech recognition scores and pre-CI VE scores. RESULTS Inconsistent with H1, after CI, participants did not demonstrate a significant decrease in VE scores. Consistent with H2, preoperative VE scores positively predicted postoperative scores of A-only sentence recognition for both sentences in quiet and in babble (rho = 0.40 to 0.45, p < 0.05), supporting a beneficial effect of pre-CI VE on post-CI auditory outcomes. Pre-CI VE was not significantly related to post-CI isolated word recognition. The raw pre-CI CUNY AV scores also predicted post-CI A-only speech recognition scores to a similar degree as VE scores. CONCLUSIONS After implantation, CI users do not demonstrate a decrease in VE from before surgery. The degree of VE during AV speech recognition before CI positively predicts A-only sentence recognition outcomes after implantation, suggesting the potential value of AV testing of CI patients preoperatively to help predict and set expectations for postoperative outcomes.
Collapse
Affiliation(s)
- Aaron C. Moberly
- Department of Otolaryngology – Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - David B. Pisoni
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana, USA
| | - Terrin N. Tamati
- Department of Otolaryngology – Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- University Medical Center Groningen, University of Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands
| |
Collapse
|
5
|
Mutlu Aİ, Yüksel M. Listening effort, fatigue, and streamed voice quality during online university courses. LOGOP PHONIATR VOCO 2024:1-8. [PMID: 38440900 DOI: 10.1080/14015439.2024.2317789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/08/2024] [Indexed: 03/06/2024]
Abstract
Understanding the impact of listening effort (LE) and fatigue has become increasingly crucial in optimizing the learning experience with the growing prevalence of online classrooms as a mode of instruction. The purpose of this study was to investigate the LE, fatigue, and voice quality experienced by students during online and face-to-face class sessions. A total of 110 participants with an average age of 20.76 (range 18-28) comprising first year undergraduate students in Speech and Language Therapy and Audiology programs in Turkey, rated their LE during the 2022-2023 spring semester using the Listening Effort Screening Questionnaire (LESQ) and assessed their fatigue with the Multidimensional Fatigue Inventory (MFI-20). Voice quality of lecturers was assessed using smoothed cepstral peak prominence (CPPS) measurements. Data were collected from both online and face-to-face sessions. The results revealed that participants reported increased LE and fatigue during online sessions compared to face-to-face sessions and the differences were statistically significant. Correlation analysis showed significant relationships (p < 0.05) between audio-video streaming quality and LE-related items in the LESQ, as well as MFI sub-scales and total scores. The findings revealed a relationship between an increased preference for face-to-face classrooms and higher levels of LE and fatigue, emphasizing the significance of these factors in shaping the learning experience. CPPS measurements indicated a dysphonic voice quality during online classroom audio streaming. These findings highlight the challenges of online classes in terms of increased LE, fatigue, and voice quality issues. Understanding these factors is crucial for improving online instruction and student experience.
Collapse
Affiliation(s)
- Ayşe İlayda Mutlu
- School of Health Sciences, Department of Speech and Language Therapy, Lokman Hekim University, Ankara, Turkey
| | - Mustafa Yüksel
- School of Health Sciences, Department of Audiology, Ankara Medipol University, Ankara, Turkey
| |
Collapse
|
6
|
Hood KE, Hurley LM. Listening to your partner: serotonin increases male responsiveness to female vocal signals in mice. Front Hum Neurosci 2024; 17:1304653. [PMID: 38328678 PMCID: PMC10847236 DOI: 10.3389/fnhum.2023.1304653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/28/2023] [Indexed: 02/09/2024] Open
Abstract
The context surrounding vocal communication can have a strong influence on how vocal signals are perceived. The serotonergic system is well-positioned for modulating the perception of communication signals according to context, because serotonergic neurons are responsive to social context, influence social behavior, and innervate auditory regions. Animals like lab mice can be excellent models for exploring how serotonin affects the primary neural systems involved in vocal perception, including within central auditory regions like the inferior colliculus (IC). Within the IC, serotonergic activity reflects not only the presence of a conspecific, but also the valence of a given social interaction. To assess whether serotonin can influence the perception of vocal signals in male mice, we manipulated serotonin systemically with an injection of its precursor 5-HTP, and locally in the IC with an infusion of fenfluramine, a serotonin reuptake blocker. Mice then participated in a behavioral assay in which males suppress their ultrasonic vocalizations (USVs) in response to the playback of female broadband vocalizations (BBVs), used in defensive aggression by females when interacting with males. Both 5-HTP and fenfluramine increased the suppression of USVs during BBV playback relative to controls. 5-HTP additionally decreased the baseline production of a specific type of USV and male investigation, but neither drug treatment strongly affected male digging or grooming. These findings show that serotonin modifies behavioral responses to vocal signals in mice, in part by acting in auditory brain regions, and suggest that mouse vocal behavior can serve as a useful model for exploring the mechanisms of context in human communication.
Collapse
Affiliation(s)
- Kayleigh E. Hood
- Hurley Lab, Department of Biology, Indiana University, Bloomington, IN, United States
- Center for the Integrative Study of Animal Behavior, Indiana University, Bloomington, IN, United States
| | - Laura M. Hurley
- Hurley Lab, Department of Biology, Indiana University, Bloomington, IN, United States
- Center for the Integrative Study of Animal Behavior, Indiana University, Bloomington, IN, United States
| |
Collapse
|
7
|
Vitevitch MS, Pisoni DB, Soehlke L, Foster TA. Using Complex Networks in the Hearing Sciences. Ear Hear 2024; 45:1-9. [PMID: 37316992 PMCID: PMC10721731 DOI: 10.1097/aud.0000000000001395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this Point of View, we review a number of recent discoveries from the emerging, interdisciplinary field of Network Science , which uses graph theoretic techniques to understand complex systems. In the network science approach, nodes represent entities in a system, and connections are placed between nodes that are related to each other to form a web-like network . We discuss several studies that demonstrate how the micro-, meso-, and macro-level structure of a network of phonological word-forms influence spoken word recognition in listeners with normal hearing and in listeners with hearing loss. Given the discoveries made possible by this new approach and the influence of several complex network measures on spoken word recognition performance we argue that speech recognition measures-originally developed in the late 1940s and routinely used in clinical audiometry-should be revised to reflect our current understanding of spoken word recognition. We also discuss other ways in which the tools of network science can be used in Speech and Hearing Sciences and Audiology more broadly.
Collapse
|
8
|
Batterink LJ, Mulgrew J, Gibbings A. Rhythmically Modulating Neural Entrainment during Exposure to Regularities Influences Statistical Learning. J Cogn Neurosci 2024; 36:107-127. [PMID: 37902580 DOI: 10.1162/jocn_a_02079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
The ability to discover regularities in the environment, such as syllable patterns in speech, is known as statistical learning. Previous studies have shown that statistical learning is accompanied by neural entrainment, in which neural activity temporally aligns with repeating patterns over time. However, it is unclear whether these rhythmic neural dynamics play a functional role in statistical learning or whether they largely reflect the downstream consequences of learning, such as the enhanced perception of learned words in speech. To better understand this issue, we manipulated participants' neural entrainment during statistical learning using continuous rhythmic visual stimulation. Participants were exposed to a speech stream of repeating nonsense words while viewing either (1) a visual stimulus with a "congruent" rhythm that aligned with the word structure, (2) a visual stimulus with an incongruent rhythm, or (3) a static visual stimulus. Statistical learning was subsequently measured using both an explicit and implicit test. Participants in the congruent condition showed a significant increase in neural entrainment over auditory regions at the relevant word frequency, over and above effects of passive volume conduction, indicating that visual stimulation successfully altered neural entrainment within relevant neural substrates. Critically, during the subsequent implicit test, participants in the congruent condition showed an enhanced ability to predict upcoming syllables and stronger neural phase synchronization to component words, suggesting that they had gained greater sensitivity to the statistical structure of the speech stream relative to the incongruent and static groups. This learning benefit could not be attributed to strategic processes, as participants were largely unaware of the contingencies between the visual stimulation and embedded words. These results indicate that manipulating neural entrainment during exposure to regularities influences statistical learning outcomes, suggesting that neural entrainment may functionally contribute to statistical learning. Our findings encourage future studies using non-invasive brain stimulation methods to further understand the role of entrainment in statistical learning.
Collapse
|
9
|
Kestens K, Keppler H, Ceuleers D, Lecointre S, De Langhe F, Degeest S. The effect of age on the hearing-related quality of life in normal-hearing adults. JOURNAL OF COMMUNICATION DISORDERS 2023; 106:106386. [PMID: 37918084 DOI: 10.1016/j.jcomdis.2023.106386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/17/2023] [Accepted: 10/20/2023] [Indexed: 11/04/2023]
Abstract
INTRODUCTION Recently, a new holistic Patient Reported Outcome Measure (PROM) to assess hearing-related quality of life was developed, named the hearing-related quality of life questionnaire for Auditory-VIsual, COgnitive and Psychosocial functioning (hAVICOP). The purpose of the current study was to evaluate if the hAVICOP is sufficiently sensitive to detect an age effect in the hearing-related quality of life. METHODS One-hundred thirteen normal-hearing participants (mean age: 42.13; range: 19 to 69 years) filled in the entire hAVICOP questionnaire online through the Research Electronic Data Capture surface. The hAVICOP consists of 27 statements, across three major subdomains (auditory-visual, cognitive, and psychosocial functioning), which have to be rated on a visual analogue scale ranging from 0 (rarely to never) to 100 (almost always). Mean scores were calculated for each subdomain separately as well as combined within a total score; the worse one's hearing-related quality of life, the lower the score. Linear regression models were run to predict the hAVICOP total as well as the three subdomain scores from age and sex. RESULTS A significant main effect of age was observed for the total hAVICOP and all three subdomain scores, indicating a decrease in hearing-related quality of life with increasing age. For none of the analyses, a significant sex effect was found. CONCLUSION The hAVICOP is sufficiently sensitive to detect an age effect in the hearing-related quality of life within a large group of normal-hearing adults, emphasizing its clinical utility. This age effect on the hearing-related quality of life might be related to the interplay of age-related changes in the bottom-up and top-down processes involved during speech processing.
Collapse
Affiliation(s)
- Katrien Kestens
- Department of Rehabilitation Sciences, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium.
| | - Hannah Keppler
- Department of Rehabilitation Sciences, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium; Department of Oto-rhino-laryngology, Ghent University Hospital, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium
| | - Dorien Ceuleers
- Department of Head and Skin, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium
| | - Stephanie Lecointre
- Department of Rehabilitation Sciences, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium
| | - Flore De Langhe
- Department of Rehabilitation Sciences, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium
| | - Sofie Degeest
- Department of Rehabilitation Sciences, Ghent University, Corneel Heymanslaan 10 (2P1), 9000 Ghent, Belgium
| |
Collapse
|
10
|
Zeng Y, Leung KKW, Jongman A, Sereno JA, Wang Y. Multi-modal cross-linguistic perception of Mandarin tones in clear speech. Front Hum Neurosci 2023; 17:1247811. [PMID: 37829822 PMCID: PMC10565566 DOI: 10.3389/fnhum.2023.1247811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 09/08/2023] [Indexed: 10/14/2023] Open
Abstract
Clearly enunciated speech (relative to conversational, plain speech) involves articulatory and acoustic modifications that enhance auditory-visual (AV) segmental intelligibility. However, little research has explored clear-speech effects on the perception of suprasegmental properties such as lexical tone, particularly involving visual (facial) perception. Since tone production does not primarily rely on vocal tract configurations, tones may be less visually distinctive. Questions thus arise as to whether clear speech can enhance visual tone intelligibility, and if so, whether any intelligibility gain can be attributable to tone-specific category-enhancing (code-based) clear-speech cues or tone-general saliency-enhancing (signal-based) cues. The present study addresses these questions by examining the identification of clear and plain Mandarin tones with visual-only, auditory-only, and AV input modalities by native (Mandarin) and nonnative (English) perceivers. Results show that code-based visual and acoustic clear tone modifications, although limited, affect both native and nonnative intelligibility, with category-enhancing cues increasing intelligibility and category-blurring cues decreasing intelligibility. In contrast, signal-based cues, which are extensively available, do not benefit native intelligibility, although they contribute to nonnative intelligibility gain. These findings demonstrate that linguistically relevant visual tonal cues are existent. In clear speech, such tone category-enhancing cues are incorporated with saliency-enhancing cues across AV modalities for intelligibility improvements.
Collapse
Affiliation(s)
- Yuyu Zeng
- Department of Linguistics, University of Kansas, Lawrence, KS, United States
| | - Keith K. W. Leung
- Department of Linguistics, Simon Fraser University, Burnaby, BC, Canada
| | - Allard Jongman
- Department of Linguistics, University of Kansas, Lawrence, KS, United States
| | - Joan A. Sereno
- Department of Linguistics, University of Kansas, Lawrence, KS, United States
| | - Yue Wang
- Department of Linguistics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
11
|
Pepper JL, Nuttall HE. Age-Related Changes to Multisensory Integration and Audiovisual Speech Perception. Brain Sci 2023; 13:1126. [PMID: 37626483 PMCID: PMC10452685 DOI: 10.3390/brainsci13081126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/20/2023] [Accepted: 07/22/2023] [Indexed: 08/27/2023] Open
Abstract
Multisensory integration is essential for the quick and accurate perception of our environment, particularly in everyday tasks like speech perception. Research has highlighted the importance of investigating bottom-up and top-down contributions to multisensory integration and how these change as a function of ageing. Specifically, perceptual factors like the temporal binding window and cognitive factors like attention and inhibition appear to be fundamental in the integration of visual and auditory information-integration that may become less efficient as we age. These factors have been linked to brain areas like the superior temporal sulcus, with neural oscillations in the alpha-band frequency also being implicated in multisensory processing. Age-related changes in multisensory integration may have significant consequences for the well-being of our increasingly ageing population, affecting their ability to communicate with others and safely move through their environment; it is crucial that the evidence surrounding this subject continues to be carefully investigated. This review will discuss research into age-related changes in the perceptual and cognitive mechanisms of multisensory integration and the impact that these changes have on speech perception and fall risk. The role of oscillatory alpha activity is of particular interest, as it may be key in the modulation of multisensory integration.
Collapse
Affiliation(s)
| | - Helen E. Nuttall
- Department of Psychology, Lancaster University, Bailrigg LA1 4YF, UK;
| |
Collapse
|
12
|
Irwin J, Harwood V, Kleinman D, Baron A, Avery T, Turcios J, Landi N. Neural and Behavioral Differences in Speech Perception for Children With Autism Spectrum Disorders Within an Audiovisual Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:2390-2403. [PMID: 37390407 PMCID: PMC10468115 DOI: 10.1044/2023_jslhr-22-00661] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/30/2023] [Accepted: 03/27/2023] [Indexed: 07/02/2023]
Abstract
PURPOSE Reduced use of visible articulatory information on a speaker's face has been implicated as a possible contributor to language deficits in autism spectrum disorders (ASD). We employ an audiovisual (AV) phonemic restoration paradigm to measure behavioral performance (button press) and event-related potentials (ERPs) of visual speech perception in children with ASD and their neurotypical peers to assess potential neural substrates that contribute to group differences. METHOD Two sets of speech stimuli, /ba/-"/a/" ("/a/" was created from the /ba/ token by a reducing the initial consonant) and /ba/-/pa/, were presented within an auditory oddball paradigm to children aged 6-13 years with ASD (n = 17) and typical development (TD; n = 33) within two conditions. The AV condition contained a fully visible speaking face; the pixelated (PX) condition included a face, but the mouth and jaw were PX, removing all articulatory information. When articulatory features were present for the /ba/-"/a/" contrast, it was expected that the influence of the visual articulators would facilitate a phonemic restoration effect in which "/a/" would be perceived as /ba/. ERPs were recorded during the experiment while children were required to press a button for the deviant sound for both sets of speech contrasts within both conditions. RESULTS Button press data revealed that TD children were more accurate in discriminating between /ba/-"/a/" and /ba/-/pa/ contrasts in the PX condition relative to the ASD group. ERPs in response to the /ba/-/pa/ contrast within both AV and PX conditions differed between children with ASD and TD children (earlier P300 responses for children with ASD). CONCLUSION Children with ASD differ in the underlying neural mechanisms responsible for speech processing compared with TD peers within an AV context.
Collapse
Affiliation(s)
- Julia Irwin
- Department of Psychology, Southern Connecticut State University, New Haven
- Haskins Laboratories, Yale University, New Haven, CT
| | - Vanessa Harwood
- Department of Communicative Disorders, University of Rhode Island, Kingston
| | | | - Alisa Baron
- Department of Communicative Disorders, University of Rhode Island, Kingston
| | | | - Jacqueline Turcios
- Department of Speech-Language Pathology, University of New Haven, West Haven, CT
| | - Nicole Landi
- Haskins Laboratories, Yale University, New Haven, CT
- Department of Psychological Sciences, University of Connecticut, Storrs
| |
Collapse
|
13
|
Harwood V, Baron A, Kleinman D, Campanelli L, Irwin J, Landi N. Event-Related Potentials in Assessing Visual Speech Cues in the Broader Autism Phenotype: Evidence from a Phonemic Restoration Paradigm. Brain Sci 2023; 13:1011. [PMID: 37508944 PMCID: PMC10377560 DOI: 10.3390/brainsci13071011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 06/26/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Audiovisual speech perception includes the simultaneous processing of auditory and visual speech. Deficits in audiovisual speech perception are reported in autistic individuals; however, less is known regarding audiovisual speech perception within the broader autism phenotype (BAP), which includes individuals with elevated, yet subclinical, levels of autistic traits. We investigate the neural indices of audiovisual speech perception in adults exhibiting a range of autism-like traits using event-related potentials (ERPs) in a phonemic restoration paradigm. In this paradigm, we consider conditions where speech articulators (mouth and jaw) are present (AV condition) and obscured by a pixelated mask (PX condition). These two face conditions were included in both passive (simply viewing a speaking face) and active (participants were required to press a button for a specific consonant-vowel stimulus) experiments. The results revealed an N100 ERP component which was present for all listening contexts and conditions; however, it was attenuated in the active AV condition where participants were able to view the speaker's face, including the mouth and jaw. The P300 ERP component was present within the active experiment only, and significantly greater within the AV condition compared to the PX condition. This suggests increased neural effort for detecting deviant stimuli when visible articulation was present and visual influence on perception. Finally, the P300 response was negatively correlated with autism-like traits, suggesting that higher autistic traits were associated with generally smaller P300 responses in the active AV and PX conditions. The conclusions support the finding that atypical audiovisual processing may be characteristic of the BAP in adults.
Collapse
Affiliation(s)
- Vanessa Harwood
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI 02881, USA
| | - Alisa Baron
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI 02881, USA
| | | | - Luca Campanelli
- Department of Communicative Disorders, University of Alabama, Tuscaloosa, AL 35487, USA
| | - Julia Irwin
- Haskins Laboratories, New Haven, CT 06519, USA
- Department of Psychology, Southern Connecticut State University, New Haven, CT 06515, USA
| | - Nicole Landi
- Haskins Laboratories, New Haven, CT 06519, USA
- Department of Psychological Sciences, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
14
|
Baron A, Harwood V, Kleinman D, Campanelli L, Molski J, Landi N, Irwin J. Where on the face do we look during phonemic restoration: An eye-tracking study. Front Psychol 2023; 14:1005186. [PMID: 37303890 PMCID: PMC10249372 DOI: 10.3389/fpsyg.2023.1005186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 04/28/2023] [Indexed: 06/13/2023] Open
Abstract
Face to face communication typically involves audio and visual components to the speech signal. To examine the effect of task demands on gaze patterns in response to a speaking face, adults participated in two eye-tracking experiments with an audiovisual (articulatory information from the mouth was visible) and a pixelated condition (articulatory information was not visible). Further, task demands were manipulated by having listeners respond in a passive (no response) or an active (button press response) context. The active experiment required participants to discriminate between speech stimuli and was designed to mimic environmental situations which require one to use visual information to disambiguate the speaker's message, simulating different listening conditions in real-world settings. Stimuli included a clear exemplar of the syllable /ba/ and a second exemplar in which the formant initial consonant was reduced creating an /a/-like consonant. Consistent with our hypothesis, results revealed that the greatest fixations to the mouth were present in the audiovisual active experiment and visual articulatory information led to a phonemic restoration effect for the /a/ speech token. In the pixelated condition, participants fixated on the eyes, and discrimination of the deviant token within the active experiment was significantly greater than the audiovisual condition. These results suggest that when required to disambiguate changes in speech, adults may look to the mouth for additional cues to support processing when it is available.
Collapse
Affiliation(s)
- Alisa Baron
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | - Vanessa Harwood
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | | | - Luca Campanelli
- Department of Communicative Disorders, The University of Alabama, Tuscaloosa, AL, United States
| | - Joseph Molski
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | - Nicole Landi
- Haskins Laboratories, New Haven, CT, United States
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
| | - Julia Irwin
- Haskins Laboratories, New Haven, CT, United States
- Department of Psychology, Southern Connecticut State University, New Haven, CT, United States
| |
Collapse
|
15
|
Fullerton AM, Vickers DA, Luke R, Billing AN, McAlpine D, Hernandez-Perez H, Peelle JE, Monaghan JJM, McMahon CM. Cross-modal functional connectivity supports speech understanding in cochlear implant users. Cereb Cortex 2023; 33:3350-3371. [PMID: 35989307 PMCID: PMC10068270 DOI: 10.1093/cercor/bhac277] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 06/10/2022] [Accepted: 06/11/2022] [Indexed: 11/12/2022] Open
Abstract
Sensory deprivation can lead to cross-modal cortical changes, whereby sensory brain regions deprived of input may be recruited to perform atypical function. Enhanced cross-modal responses to visual stimuli observed in auditory cortex of postlingually deaf cochlear implant (CI) users are hypothesized to reflect increased activation of cortical language regions, but it is unclear if this cross-modal activity is "adaptive" or "mal-adaptive" for speech understanding. To determine if increased activation of language regions is correlated with better speech understanding in CI users, we assessed task-related activation and functional connectivity of auditory and visual cortices to auditory and visual speech and non-speech stimuli in CI users (n = 14) and normal-hearing listeners (n = 17) and used functional near-infrared spectroscopy to measure hemodynamic responses. We used visually presented speech and non-speech to investigate neural processes related to linguistic content and observed that CI users show beneficial cross-modal effects. Specifically, an increase in connectivity between the left auditory and visual cortices-presumed primary sites of cortical language processing-was positively correlated with CI users' abilities to understand speech in background noise. Cross-modal activity in auditory cortex of postlingually deaf CI users may reflect adaptive activity of a distributed, multimodal speech network, recruited to enhance speech understanding.
Collapse
Affiliation(s)
- Amanda M Fullerton
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Deborah A Vickers
- Cambridge Hearing Group, Sound Lab, Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 OSZ, United Kingdom
- Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom
| | - Robert Luke
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Addison N Billing
- Institute of Cognitive Neuroscience, University College London, London WCIN 3AZ, United Kingdom
- DOT-HUB, Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, United Kingdom
| | - David McAlpine
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Heivet Hernandez-Perez
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO 63110, United States
| | - Jessica J M Monaghan
- National Acoustic Laboratories, Australian Hearing Hub, Sydney 2109, Australia
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Catherine M McMahon
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
- HEAR Centre, Macquarie University, Sydney 2109, Australia
| |
Collapse
|
16
|
Beadle J, Kim J, Davis C. Visual Speech Improves Older and Younger Adults' Response Time and Accuracy for Speech Comprehension in Noise. Trends Hear 2022; 26:23312165221145006. [PMID: 36524310 PMCID: PMC9761220 DOI: 10.1177/23312165221145006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Past research suggests that older adults expend more cognitive resources when processing visual speech than younger adults. If so, given resource limitations, older adults may not get as large a visual speech benefit as younger ones on a resource-demanding speech processing task. We tested this using a speech comprehension task that required attention across two talkers and a simple response (i.e., the question-and-answer task) and measured response time and accuracy. Specifically, we compared the size of visual speech benefit for older and younger adults. We also examined whether the presence of a visual distractor would reduce the visual speech benefit more for older than younger adults. Twenty-five older adults (12 females, MAge = 72) and 25 younger adults (17 females, MAge = 22) completed the question-and-answer task under time pressure. The task included the following conditions: auditory and visual (AV) speech; AV speech plus visual distractor; and auditory speech with static face images. Both age groups showed a visual speech benefit regardless of whether a visual distractor was also presented. Likewise, the size of the visual speech benefit did not significantly interact with age group for accuracy or the potentially more sensitive response time measure.
Collapse
Affiliation(s)
- Julie Beadle
- The MARCS Institute for Brain, Behaviour, and Development,
Western Sydney
University, Sydney, Australia,The HEARing CRC, Australia
| | - Jeesun Kim
- The MARCS Institute for Brain, Behaviour, and Development,
Western Sydney
University, Sydney, Australia
| | - Chris Davis
- The MARCS Institute for Brain, Behaviour, and Development,
Western Sydney
University, Sydney, Australia,The HEARing CRC, Australia,Chris Davis, Western Sydney University, The
MARCS Institute for Brain, Behaviour and Development, Westmead Innovation
Quarter, Building U, Level 4, 160 Hawkesbury Road, Westmead NSW 2145, Australia.
| |
Collapse
|
17
|
Benefits of Text Supplementation on Sentence Recognition and Subjective Ratings With and Without Facial Cues for Listeners With Normal Hearing. Ear Hear 2022:00003446-990000000-00088. [PMID: 36534697 DOI: 10.1097/aud.0000000000001316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
OBJECTIVES Recognizing speech through telecommunication can be challenging in unfavorable listening conditions. Text supplementation or provision of facial cues can facilitate speech recognition under some circumstances. However, our understanding of the combined benefit of text and facial cues in telecommunication is limited. The purpose of this study was to investigate the potential benefit of text supplementation for sentence recognition scores and subjective ratings of spoken speech with and without facial cues available. DESIGN Twenty adult females (M = 24 years, range 21 to 29 years) with normal hearing performed a sentence recognition task and also completed a subjective rating questionnaire in 24 conditions. The conditions varied by integrity of the available facial cues (clear facial cues, slight distortion facial cues, great distortion facial cues, no facial cues), signal-to-noise ratio (quiet, +1 dB, -3 dB), and text availability (with text, without text). When present, the text was an 86 to 88% accurate transcription of the auditory signal presented at a 500 ms delay relative to the auditory signal. RESULTS The benefits of text supplementation were largest when facial cues were not available and when the signal-to-noise ratio was unfavorable. Although no recognition score benefit was present in quiet, recognition benefit was significant in all levels of background noise for all levels of facial cue integrity. Moreover, participant subjective ratings of text benefit were robust and present even in the absence of recognition benefit. Consistent with previous literature, facial cues were beneficial for sentence recognition scores in the most unfavorable signal-to-noise ratio, even when greatly distorted. It is interesting that, although all levels of facial cues were beneficial for recognition scores, participants rated a significant benefit only with clear facial cues. CONCLUSIONS The benefit of text for auditory-only and auditory-visual speech recognition is evident in recognition scores and subjective ratings; the benefit is larger and more robust for subjective ratings than for scores. Therefore, text supplementation might provide benefit that extends beyond speech recognition scores. Combined, these findings support the use of text supplementation in telecommunication, even when facial cues are concurrently present, such as during teleconferencing or watching television.
Collapse
|
18
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
19
|
Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022; 426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]
|
20
|
Fogerty D, Madorskiy R, Vickery B, Shafiro V. Recognition of Interrupted Speech, Text, and Text-Supplemented Speech by Older Adults: Effect of Interruption Rate. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4404-4416. [PMID: 36251884 PMCID: PMC9940893 DOI: 10.1044/2022_jslhr-22-00247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 05/03/2023]
Abstract
PURPOSE Studies of speech and text interruption indicate that the interruption rate influences the perceptual information available, from whole words at slow rates to subphonemic cues at faster interruptions rates. In young adults, the benefit obtained from text supplementation of speech may depend on the type of perceptual information available in either modality. Age commonly reduces temporal aspects of information processing, which may influence the benefit older adults obtain from text-supplemented speech across interruption rates. METHOD Older adults were tested unimodally and multimodally with spoken and printed sentences that were interrupted by silence or white space at various rates. RESULTS Results demonstrate U-shaped performance-rate functions for all modality conditions, with minimal performance around interruption rates of 2-4 Hz. Comparison to previous studies with younger adults indicates overall poorer recognition for interrupted materials by the older adults. However, as a group, older adults can integrate information between the two modalities to a similar degree as younger adults. Individual differences in multimodal integration were noted. CONCLUSION Overall, these results indicate that older adults, while demonstrating poorer overall performance in comparison to younger adults, successfully combine distributed partial information across speech and text modalities to facilitate sentence recognition.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Rachel Madorskiy
- Department of Speech, Language, Hearing, and Occupational Sciences, University of Montana, Missoula
| | - Blythe Vickery
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia
| | - Valeriy Shafiro
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| |
Collapse
|
21
|
Yang W, Guo A, Yao H, Yang X, Li Z, Li S, Chen J, Ren Y, Yang J, Wu J, Zhang Z. Effect of aging on audiovisual integration: Comparison of high- and low-intensity conditions in a speech discrimination task. Front Aging Neurosci 2022; 14:1010060. [DOI: 10.3389/fnagi.2022.1010060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open
Abstract
Audiovisual integration is an essential process that influences speech perception in conversation. However, it is still debated whether older individuals benefit more from audiovisual integration than younger individuals. This ambiguity is likely due to stimulus features, such as stimulus intensity. The purpose of the current study was to explore the effect of aging on audiovisual integration, using event-related potentials (ERPs) at different stimulus intensities. The results showed greater audiovisual integration in older adults at 320–360 ms. Conversely, at 460–500 ms, older adults displayed attenuated audiovisual integration in the frontal, fronto-central, central, and centro-parietal regions compared to younger adults. In addition, we found older adults had greater audiovisual integration at 200–230 ms under the low-intensity condition compared to the high-intensity condition, suggesting inverse effectiveness occurred. However, inverse effectiveness was not found in younger adults. Taken together, the results suggested that there was age-related dissociation in audiovisual integration and inverse effectiveness, indicating that the neural mechanisms underlying audiovisual integration differed between older adults and younger adults.
Collapse
|
22
|
Begau A, Klatt LI, Schneider D, Wascher E, Getzmann S. The role of informational content of visual speech in an audiovisual cocktail party: Evidence from cortical oscillations in young and old participants. Eur J Neurosci 2022; 56:5215-5234. [PMID: 36017762 DOI: 10.1111/ejn.15811] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/01/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]
Abstract
Age-related differences in the processing of audiovisual speech in a multi-talker environment were investigated analysing event-related spectral perturbations (ERSPs), focusing on theta, alpha and beta oscillations that are assumed to reflect conflict processing, multisensory integration and attentional mechanisms, respectively. Eighteen older and 21 younger healthy adults completed a two-alternative forced-choice word discrimination task, responding to audiovisual speech stimuli. In a cocktail-party scenario with two competing talkers (located at -15° and 15° azimuth), target words (/yes/or/no/) appeared at a pre-defined (attended) position, distractor words at the other position. In two audiovisual conditions, acoustic speech was combined either with informative or uninformative visual speech. While a behavioural benefit for informative visual speech occurred for both age groups, differences between audiovisual conditions in the theta and beta band were only present for older adults. A stronger increase in theta perturbations for stimuli containing uninformative visual speech could be associated with early conflict processing, while a stronger suppression in beta perturbations for informative visual speech could be associated to audiovisual integration. Compared to the younger group, the older group showed generally stronger beta perturbations. No condition differences in the alpha band were found. Overall, the findings suggest age-related differences in audiovisual speech integration in a multi-talker environment. While the behavioural benefit of informative visual speech was unaffected by age, older adults had a stronger need for cognitive control when processing conflicting audiovisual speech input. Furthermore, mechanisms of audiovisual integration are differently activated depending on the informational content of the visual information.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| |
Collapse
|
23
|
Francis AL. Adding noise is a confounded nuisance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1375. [PMID: 36182286 DOI: 10.1121/10.0013874] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 08/15/2022] [Indexed: 06/16/2023]
Abstract
A wide variety of research and clinical assessments involve presenting speech stimuli in the presence of some kind of noise. Here, I selectively review two theoretical perspectives and discuss ways in which these perspectives may help researchers understand the consequences for listeners of adding noise to a speech signal. I argue that adding noise changes more about the listening task than merely making the signal more difficult to perceive. To fully understand the effects of an added noise on speech perception, we must consider not just how much the noise affects task difficulty, but also how it affects all of the systems involved in understanding speech: increasing message uncertainty, modifying attentional demand, altering affective response, and changing motivation to perform the task.
Collapse
Affiliation(s)
- Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, Indiana 47907, USA
| |
Collapse
|
24
|
Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022; 84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]
Abstract
Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as "measures of audiovisual integration," the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.
Collapse
Affiliation(s)
| | - Violet A Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, Saint Louis, MO, USA
| | - Julia F Strand
- Department of Psychology, Carleton College, Northfield, MN, USA
| |
Collapse
|
25
|
Bernstein LE, Jordan N, Auer ET, Eberhardt SP. Lipreading: A Review of Its Continuing Importance for Speech Recognition With an Acquired Hearing Loss and Possibilities for Effective Training. Am J Audiol 2022; 31:453-469. [PMID: 35316072 PMCID: PMC9524756 DOI: 10.1044/2021_aja-21-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/25/2021] [Accepted: 12/30/2021] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Nicole Jordan
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Edward T. Auer
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| |
Collapse
|
26
|
Wilms V, Drijvers L, Brouwer S. The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1822-1838. [PMID: 35439423 DOI: 10.1044/2022_jslhr-21-00387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study investigated to what extent iconic co-speech gestures help word intelligibility in sentence context in two different linguistic maskers (native vs. foreign). It was hypothesized that sentence recognition improves with the presence of iconic co-speech gestures and with foreign compared to native babble. METHOD Thirty-two native Dutch participants performed a Dutch word recognition task in context in which they were presented with videos in which an actress uttered short Dutch sentences (e.g., Ze begint te openen, "She starts to open"). Participants were presented with a total of six audiovisual conditions: no background noise (i.e., clear condition) without gesture, no background noise with gesture, French babble without gesture, French babble with gesture, Dutch babble without gesture, and Dutch babble with gesture; and they were asked to type down what was said by the Dutch actress. The accurate identification of the action verbs at the end of the target sentences was measured. RESULTS The results demonstrated that performance on the task was better in the gesture compared to the nongesture conditions (i.e., gesture enhancement effect). In addition, performance was better in French babble than in Dutch babble. CONCLUSIONS Listeners benefit from iconic co-speech gestures during communication and from foreign background speech compared to native. These insights into multimodal communication may be valuable to everyone who engages in multimodal communication and especially to a public who often works in public places where competing speech is present in the background.
Collapse
Affiliation(s)
- Veerle Wilms
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| | - Linda Drijvers
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Susanne Brouwer
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
27
|
Basharat A, Thayanithy A, Barnett-Cowan M. A Scoping Review of Audiovisual Integration Methodology: Screening for Auditory and Visual Impairment in Younger and Older Adults. Front Aging Neurosci 2022; 13:772112. [PMID: 35153716 PMCID: PMC8829696 DOI: 10.3389/fnagi.2021.772112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/17/2021] [Indexed: 11/13/2022] Open
Abstract
With the rise of the aging population, many scientists studying multisensory integration have turned toward understanding how this process may change with age. This scoping review was conducted to understand and describe the scope and rigor with which researchers studying audiovisual sensory integration screen for hearing and vision impairment. A structured search in three licensed databases (Scopus, PubMed, and PsychInfo) using the key concepts of multisensory integration, audiovisual modality, and aging revealed 2,462 articles, which were screened for inclusion by two reviewers. Articles were included if they (1) tested healthy older adults (minimum mean or median age of 60) with younger adults as a comparison (mean or median age between 18 and 35), (2) measured auditory and visual integration, (3) were written in English, and (4) reported behavioral outcomes. Articles that included the following were excluded: (1) tested taste exclusively, (2) tested olfaction exclusively, (3) tested somatosensation exclusively, (4) tested emotion perception, (5) were not written in English, (6) were clinical commentaries, editorials, interviews, letters, newspaper articles, abstracts only, or non-peer reviewed literature (e.g., theses), and (7) focused on neuroimaging without a behavioral component. Data pertaining to the details of the study (e.g., country of publication, year of publication, etc.) were extracted, however, of higher importance to our research question, data pertaining to screening measures used for hearing and vision impairment (e.g., type of test used, whether hearing- and visual-aids were worn, thresholds used, etc.) were extracted, collated, and summarized. Our search revealed that only 64% of studies screened for age-abnormal hearing impairment, 51% screened for age-abnormal vision impairment, and that consistent definitions of normal or abnormal vision and hearing were not used among the studies that screened for sensory abilities. A total of 1,624 younger adults and 4,778 older participants were included in the scoping review with males composing approximately 44% and females composing 56% of the total sample and most of the data was obtained from only four countries. We recommend that studies investigating the effects of aging on multisensory integration should screen for normal vision and hearing by using the World Health Organization's (WHO) hearing loss and visual impairment cut-off scores in order to maintain consistency among other aging researchers. As mild cognitive impairment (MCI) has been defined as a “transitional” or a “transitory” stage between normal aging and dementia and because approximately 3–5% of the aging population will develop MCI each year, it is therefore important that when researchers aim to study a healthy aging population, that they appropriately screen for MCI. One of our secondary aims was to determine how often researchers were screening for cognitive impairment and the types of tests that were used to do so. Our results revealed that only 55 out of 72 studies tested for neurological and cognitive function, and only a subset used standardized tests. Additionally, among the studies that used standardized tests, the cut-off scores used were not always adequate for screening out mild cognitive impairment. An additional secondary aim of this scoping review was to determine the feasibility of whether a meta-analysis could be conducted in the future to further quantitatively evaluate the results (i.e., are the findings obtained from studies using self-reported vision and hearing impairment screening methods significantly different from those measuring vision and hearing impairment in the lab) and to assess the scope of this problem. We found that it may not be feasible to conduct a meta-analysis with the entire dataset of this scoping review. However, a meta-analysis can be conducted if stricter parameters are used (e.g., focusing on accuracy or response time data only).Systematic Review Registration:https://doi.org/10.17605/OSF.IO/GTUHD.
Collapse
|
28
|
Revisiting the relationship between implicit racial bias and audiovisual benefit for nonnative-accented speech. Atten Percept Psychophys 2022; 84:2074-2086. [PMID: 34988904 DOI: 10.3758/s13414-021-02423-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2021] [Indexed: 01/25/2023]
Abstract
Speech intelligibility is improved when the listener can see the talker in addition to hearing their voice. Notably, though, previous work has suggested that this "audiovisual benefit" for nonnative (i.e., foreign-accented) speech is smaller than the benefit for native speech, an effect that may be partially accounted for by listeners' implicit racial biases (Yi et al., 2013, The Journal of the Acoustical Society of America, 134[5], EL387-EL393.). In the present study, we sought to replicate these findings in a significantly larger sample of online participants. In a direct replication of Yi et al. (Experiment 1), we found that audiovisual benefit was indeed smaller for nonnative-accented relative to native-accented speech. However, our results did not support the conclusion that implicit racial biases, as measured with two types of implicit association tasks, were related to these differences in audiovisual benefit for native and nonnative speech. In a second experiment, we addressed a potential confound in the experimental design; to ensure that the difference in audiovisual benefit was caused by a difference in accent rather than a difference in overall intelligibility, we reversed the overall difficulty of each accent condition by presenting them at different signal-to-noise ratios. Even when native speech was presented at a much more difficult intelligibility level than nonnative speech, audiovisual benefit for nonnative speech remained poorer. In light of these findings, we discuss alternative explanations of reduced audiovisual benefit for nonnative speech, as well as methodological considerations for future work examining the intersection of social, cognitive, and linguistic processes.
Collapse
|
29
|
Bilinguals Show Proportionally Greater Benefit From Visual Speech Cues and Sentence Context in Their Second Compared to Their First Language. Ear Hear 2021; 43:1316-1326. [PMID: 34966162 DOI: 10.1097/aud.0000000000001182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVES Speech perception in noise is challenging, but evidence suggests that it may be facilitated by visual speech cues (e.g., lip movements) and supportive sentence context in native speakers. Comparatively few studies have investigated speech perception in noise in bilinguals, and little is known about the impact of visual speech cues and supportive sentence context in a first language compared to a second language within the same individual. The current study addresses this gap by directly investigating the extent to which bilinguals benefit from visual speech cues and supportive sentence context under similarly noisy conditions in their first and second language. DESIGN Thirty young adult English-French/French-English bilinguals were recruited from the undergraduate psychology program at Concordia University and from the Montreal community. They completed a speech perception in noise task during which they were presented with video-recorded sentences and instructed to repeat the last word of each sentence out loud. Sentences were presented in three different modalities: visual-only, auditory-only, and audiovisual. Additionally, sentences had one of two levels of context: moderate (e.g., "In the woods, the hiker saw a bear.") and low (e.g., "I had not thought about that bear."). Each participant completed this task in both their first and second language; crucially, the level of background noise was calibrated individually for each participant and was the same throughout the first language and second language (L2) portions of the experimental task. RESULTS Overall, speech perception in noise was more accurate in bilinguals' first language compared to the second. However, participants benefited from visual speech cues and supportive sentence context to a proportionally greater extent in their second language compared to their first. At the individual level, performance during the speech perception in noise task was related to aspects of bilinguals' experience in their second language (i.e., age of acquisition, relative balance between the first and the second language). CONCLUSIONS Bilinguals benefit from visual speech cues and sentence context in their second language during speech in noise and do so to a greater extent than in their first language given the same level of background noise. Together, this indicates that L2 speech perception can be conceptualized within an inverse effectiveness hypothesis framework with a complex interplay of sensory factors (i.e., the quality of the auditory speech signal and visual speech cues) and linguistic factors (i.e., presence or absence of supportive context and L2 experience of the listener).
Collapse
|
30
|
Beadle J, Kim J, Davis C. Effects of Age and Uncertainty on the Visual Speech Benefit in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5041-5060. [PMID: 34762813 DOI: 10.1044/2021_jslhr-20-00495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
PURPOSE Listeners understand significantly more speech in noise when the talker's face can be seen (visual speech) in comparison to an auditory-only baseline (a visual speech benefit). This study investigated whether the visual speech benefit is reduced when the correspondence between auditory and visual speech is uncertain and whether any reduction is affected by listener age (older vs. younger) and how severe the auditory signal is masked. METHOD Older and younger adults completed a speech recognition in noise task that included an auditory-only condition and four auditory-visual (AV) conditions in which one, two, four, or six silent talking face videos were presented. One face always matched the auditory signal; the other face(s) did not. Auditory speech was presented in noise at -6 and -1 dB signal-to-noise ratio (SNR). RESULTS When the SNR was -6 dB, for both age groups, the standard-sized visual speech benefit reduced as more talking faces were presented. When the SNR was -1 dB, younger adults received the standard-sized visual speech benefit even when two talking faces were presented, whereas older adults did not. CONCLUSIONS The size of the visual speech benefit obtained by older adults was always smaller when AV correspondence was uncertain; this was not the case for younger adults. Difficulty establishing AV correspondence may be a factor that limits older adults' speech recognition in noisy AV environments. Supplemental Material https://doi.org/10.23641/asha.16879549.
Collapse
Affiliation(s)
- Julie Beadle
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| | - Jeesun Kim
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| | - Chris Davis
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| |
Collapse
|
31
|
Sandhya, Vinay, V M. Perception of Incongruent Audiovisual Speech: Distribution of Modality-Specific Responses. Am J Audiol 2021; 30:968-979. [PMID: 34499528 DOI: 10.1044/2021_aja-20-00213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Multimodal sensory integration in audiovisual (AV) speech perception is a naturally occurring phenomenon. Modality-specific responses such as auditory left, auditory right, and visual responses to dichotic incongruent AV speech stimuli help in understanding AV speech processing through each input modality. It is observed that distribution of activity in the frontal motor areas involved in speech production has been shown to correlate with how subjects perceive the same syllable differently or perceive different syllables. This study investigated the distribution of modality-specific responses to dichotic incongruent AV speech stimuli by simultaneously presenting consonant-vowel (CV) syllables with different places of articulation to the participant's left and right ears and visually. DESIGN A dichotic experimental design was adopted. Six stop CV syllables /pa/, /ta/, /ka/, /ba/, /da/, and /ga/ were assembled to create dichotic incongruent AV speech material. Participants included 40 native speakers of Norwegian (20 women, M age = 22.6 years, SD = 2.43 years; 20 men, M age = 23.7 years, SD = 2.08 years). RESULTS Findings of this study showed that, under dichotic listening conditions, velar CV syllables resulted in the highest scores in the respective ears, and this might be explained by stimulus dominance of velar consonants, as shown in previous studies. However, this study, with dichotic auditory stimuli accompanied by an incongruent video segment, demonstrated that the presentation of a visually distinct video segment possibly draws attention to the video segment in some participants, thereby reducing the overall recognition of the dominant syllable. Furthermore, the findings here suggest the possibility of lesser response times to incongruent AV stimuli in females compared with males. CONCLUSION The identification of the left audio, right audio, and visual segments in dichotic incongruent AV stimuli depends on place of articulation, stimulus dominance, and voice onset time of the CV syllables.
Collapse
Affiliation(s)
- Sandhya
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Vinay
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Manchaiah, V
- Department of Speech and Hearing Sciences, Lamar University, Beaumont, TX
| |
Collapse
|
32
|
Myerson J, Tye-Murray N, Spehar B, Hale S, Sommers M. Predicting Audiovisual Word Recognition in Noisy Situations: Toward Precision Audiology. Ear Hear 2021; 42:1656-1667. [PMID: 34320527 PMCID: PMC8545708 DOI: 10.1097/aud.0000000000001072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Spoken communication is better when one can see as well as hear the talker. Although age-related deficits in speech perception were observed, Tye-Murray and colleagues found that even when age-related deficits in audiovisual (AV) speech perception were observed, AV performance could be accurately predicted from auditory-only (A-only) and visual-only (V-only) performance, and that knowing individuals' ages did not increase the accuracy of prediction. This finding contradicts conventional wisdom, according to which age-related differences in AV speech perception are due to deficits in the integration of auditory and visual information, and our primary goal was to determine whether Tye-Murray et al.'s finding with a closed-set test generalizes to situations more like those in everyday life. A second goal was to test a new predictive model that has important implications for audiological assessment. DESIGN Participants (N = 109; ages 22-93 years), previously studied by Tye-Murray et al., were administered our new, open-set Lex-List test to assess their auditory, visual, and audiovisual perception of individual words. All testing was conducted in six-talker babble (three males and three females) presented at approximately 62 dB SPL. The level of the audio for the Lex-List items, when presented, was approximately 59 dB SPL because pilot testing suggested that this signal-to-noise ratio would avoid ceiling performance under the AV condition. RESULTS Multiple linear regression analyses revealed that A-only and V-only performance accounted for 87.9% of the variance in AV speech perception, and that the contribution of age failed to reach significance. Our new parabolic model accounted for even more (92.8%) of the variance in AV performance, and again, the contribution of age was not significant. Bayesian analyses revealed that for both linear and parabolic models, the present data were almost 10 times as likely to occur with a reduced model (without age) than with a full model (with age as a predictor). Furthermore, comparison of the two reduced models revealed that the data were more than 100 times as likely to occur with the parabolic model than with the linear regression model. CONCLUSIONS The present results strongly support Tye-Murray et al.'s hypothesis that AV performance can be accurately predicted from unimodal performance and that knowing individuals' ages does not increase the accuracy of that prediction. Our results represent an important initial step in extending Tye-Murray et al.'s findings to situations more like those encountered in everyday communication. The accuracy with which speech perception was predicted in this study foreshadows a form of precision audiology in which determining individual strengths and weaknesses in unimodal and multimodal speech perception facilitates identification of targets for rehabilitative efforts aimed at recovering and maintaining speech perception abilities critical to the quality of an older adult's life.
Collapse
Affiliation(s)
- Joel Myerson
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University School of Medicine, Saint Louis, Missouri, U.S.A
| | - Brent Spehar
- Department of Otolaryngology, Washington University School of Medicine, Saint Louis, Missouri, U.S.A
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| | - Mitchell Sommers
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| |
Collapse
|
33
|
Dias JW, McClaskey CM, Harris KC. Early auditory cortical processing predicts auditory speech in noise identification and lipreading. Neuropsychologia 2021; 161:108012. [PMID: 34474065 DOI: 10.1016/j.neuropsychologia.2021.108012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 08/20/2021] [Accepted: 08/26/2021] [Indexed: 10/20/2022]
Abstract
Individuals typically exhibit better cross-sensory perception following unisensory loss, demonstrating improved perception of information available from the remaining senses and increased cross-sensory use of neural resources. Even individuals with no sensory loss will exhibit such changes in cross-sensory processing following temporary sensory deprivation, suggesting that the brain's capacity for recruiting cross-sensory sources to compensate for degraded unisensory input is a general characteristic of the perceptual process. Many studies have investigated how auditory and visual neural structures respond to within- and cross-sensory input. However, little attention has been given to how general auditory and visual neural processing relates to within and cross-sensory perception. The current investigation examines the extent to which individual differences in general auditory neural processing accounts for variability in auditory, visual, and audiovisual speech perception in a sample of young healthy adults. Auditory neural processing was assessed using a simple click stimulus. We found that individuals with a smaller P1 peak amplitude in their auditory-evoked potential (AEP) had more difficulty identifying speech sounds in difficult listening conditions, but were better lipreaders. The results suggest that individual differences in the auditory neural processing of healthy adults can account for variability in the perception of information available from the auditory and visual modalities, similar to the cross-sensory perceptual compensation observed in individuals with sensory loss.
Collapse
Affiliation(s)
- James W Dias
- Medical University of South Carolina, United States.
| | | | | |
Collapse
|
34
|
Banks B, Gowen E, Munro KJ, Adank P. Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3432-3445. [PMID: 34463528 DOI: 10.1044/2021_jslhr-21-00106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/.
Collapse
Affiliation(s)
- Briony Banks
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Emma Gowen
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
- Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
35
|
van de Rijt LPH, van Opstal AJ, van Wanrooij MM. Multisensory Integration-Attention Trade-Off in Cochlear-Implanted Deaf Individuals. Front Neurosci 2021; 15:683804. [PMID: 34393707 PMCID: PMC8358073 DOI: 10.3389/fnins.2021.683804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
The cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments. CI users therefore rely heavily on visual cues to augment speech recognition, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and reading were negatively impacted in divided-attention tasks for CI users—but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources in situations with uncertainty about the upcoming stimulus modality. We argue that in order to determine the benefit of a CI for speech recognition, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.
Collapse
Affiliation(s)
- Luuk P H van de Rijt
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, Netherlands
| | - A John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Marc M van Wanrooij
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
36
|
Bak K, Chan GSW, Schutz M, Campos JL. Perceptions of Audio-Visual Impact Events in Younger and Older Adults. Multisens Res 2021; 34:1-30. [PMID: 34298502 DOI: 10.1163/22134808-bja10056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 06/17/2021] [Indexed: 11/19/2022]
Abstract
Previous studies have examined whether audio-visual integration changes in older age, with some studies reporting age-related differences and others reporting no differences. Most studies have either used very basic and ambiguous stimuli (e.g., flash/beep) or highly contextualized, causally related stimuli (e.g., speech). However, few have used tasks that fall somewhere between the extremes of this continuum, such as those that include contextualized, causally related stimuli that are not speech-based; for example, audio-visual impact events. The present study used a paradigm requiring duration estimates and temporal order judgements (TOJ) of audio-visual impact events. Specifically, the Schutz-Lipscomb illusion, in which the perceived duration of a percussive tone is influenced by the length of the visual striking gesture, was examined in younger and older adults. Twenty-one younger and 21 older adult participants were presented with a visual point-light representation of a percussive impact event (i.e., a marimbist striking their instrument with a long or short gesture) combined with a percussive auditory tone. Participants completed a tone duration judgement task and a TOJ task. Five audio-visual temporal offsets (-400 to +400 ms) and five spatial offsets (from -90 to +90°) were randomly introduced. Results demonstrated that the strength of the illusion did not differ between older and younger adults and was not influenced by spatial or temporal offsets. Older adults showed an 'auditory first bias' when making TOJs. The current findings expand what is known about age-related differences in audio-visual integration by considering them in the context of impact-related events.
Collapse
Affiliation(s)
- Katherine Bak
- Department of Psychology, University of Toronto, 27 King's College Circle, Toronto, ON, Canada, M5S 1A1
- The KITE Research Institute-University Health Network, 550 University Avenue, Toronto, ON, Canada, M5G 2A2
| | - George S W Chan
- The KITE Research Institute-University Health Network, 550 University Avenue, Toronto, ON, Canada, M5G 2A2
| | - Michael Schutz
- School of the Arts, McMaster University, 1280 Main Street West, Hamilton, ON, Canada, L8S 4L8
- Department of Psychology, Neuroscience, and Behaviour, McMaster University, 1280 Main Street West, Hamilton, ON, Canada, L8S 4L8
| | - Jennifer L Campos
- Department of Psychology, University of Toronto, 27 King's College Circle, Toronto, ON, Canada, M5S 1A1
- The KITE Research Institute-University Health Network, 550 University Avenue, Toronto, ON, Canada, M5G 2A2
| |
Collapse
|
37
|
Abstract
OBJECTIVES When auditory and visual speech information are presented together, listeners obtain an audiovisual (AV) benefit or a speech understanding improvement compared with auditory-only (AO) or visual-only (VO) presentations. Cochlear-implant (CI) listeners, who receive degraded speech input and therefore understand speech using primarily temporal information, seem to readily use visual cues and can achieve a larger AV benefit than normal-hearing (NH) listeners. It is unclear, however, if the AV benefit remains relatively large for CI listeners when trying to understand foreign-accented speech when compared with unaccented speech. Accented speech can introduce changes to temporal auditory cues and visual cues, which could decrease the usefulness of AV information. Furthermore, we sought to determine if the AV benefit was relatively larger in CI compared with NH listeners for both unaccented and accented speech. DESIGN AV benefit was investigated for unaccented and Spanish-accented speech by presenting English sentences in AO, VO, and AV conditions to 15 CI and 15 age- and performance-matched NH listeners. Performance matching between NH and CI listeners was achieved by varying the number of channels of a noise vocoder for the NH listeners. Because of the differences in age and hearing history of the CI listeners, the effects of listener-related variables on speech understanding performance and AV benefit were also examined. RESULTS AV benefit was observed for both unaccented and accented conditions and for both CI and NH listeners. The two groups showed similar performance for the AO and AV conditions, and the normalized AV benefit was relatively smaller for the accented than the unaccented conditions. In the CI listeners, older age was associated with significantly poorer performance with the accented speaker compared with the unaccented speaker. The negative impact of age was somewhat reduced by a significant improvement in performance with access to AV information. CONCLUSIONS When auditory speech information is degraded by CI sound processing, visual cues can be used to improve speech understanding, even in the presence of a Spanish accent. The AV benefit of the CI listeners closely matched that of the NH listeners presented with vocoded speech, which was unexpected given that CI listeners appear to rely more on visual information to communicate. This result is perhaps due to the one-to-one age and performance matching of the listeners. While aging decreased CI listener performance with the accented speaker, access to visual cues boosted performance and could partially overcome the age-related speech understanding deficits for the older CI listeners.
Collapse
|
38
|
Tremblay P, Basirat A, Pinto S, Sato M. Visual prediction cues can facilitate behavioural and neural speech processing in young and older adults. Neuropsychologia 2021; 159:107949. [PMID: 34228997 DOI: 10.1016/j.neuropsychologia.2021.107949] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 06/16/2021] [Accepted: 07/01/2021] [Indexed: 02/06/2023]
Abstract
The ability to process speech evolves over the course of the lifespan. Understanding speech at low acoustic intensity and in the presence of background noise becomes harder, and the ability for older adults to benefit from audiovisual speech also appears to decline. These difficulties can have important consequences on quality of life. Yet, a consensus on the cause of these difficulties is still lacking. The objective of this study was to examine the processing of speech in young and older adults under different modalities (i.e. auditory [A], visual [V], audiovisual [AV]) and in the presence of different visual prediction cues (i.e., no predictive cue (control), temporal predictive cue, phonetic predictive cue, and combined temporal and phonetic predictive cues). We focused on recognition accuracy and four auditory evoked potential (AEP) components: P1-N1-P2 and N2. Thirty-four right-handed French-speaking adults were recruited, including 17 younger adults (28 ± 2 years; 20-42 years) and 17 older adults (67 ± 3.77 years; 60-73 years). Participants completed a forced-choice speech identification task. The main findings of the study are: (1) The faciliatory effect of visual information was reduced, but present, in older compared to younger adults, (2) visual predictive cues facilitated speech recognition in younger and older adults alike, (3) age differences in AEPs were localized to later components (P2 and N2), suggesting that aging predominantly affects higher-order cortical processes related to speech processing rather than lower-level auditory processes. (4) Specifically, AV facilitation on P2 amplitude was lower in older adults, there was a reduced effect of the temporal predictive cue on N2 amplitude for older compared to younger adults, and P2 and N2 latencies were longer for older adults. Finally (5) behavioural performance was associated with P2 amplitude in older adults. Our results indicate that aging affects speech processing at multiple levels, including audiovisual integration (P2) and auditory attentional processes (N2). These findings have important implications for understanding barriers to communication in older ages, as well as for the development of compensation strategies for those with speech processing difficulties.
Collapse
Affiliation(s)
- Pascale Tremblay
- Département de Réadaptation, Faculté de Médecine, Université Laval, Quebec City, Canada; Cervo Brain Research Centre, Quebec City, Canada.
| | - Anahita Basirat
- Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France
| | - Serge Pinto
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| | - Marc Sato
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
39
|
Age Differences in the Effects of Speaking Rate on Auditory, Visual, and Auditory-Visual Speech Perception. Ear Hear 2021; 41:549-560. [PMID: 31453875 DOI: 10.1097/aud.0000000000000776] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study was designed to examine how speaking rate affects auditory-only, visual-only, and auditory-visual speech perception across the adult lifespan. In addition, the study examined the extent to which unimodal (auditory-only and visual-only) performance predicts auditory-visual performance across a range of speaking rates. The authors hypothesized significant Age × Rate interactions in all three modalities and that unimodal performance would account for a majority of the variance in auditory-visual speech perception for speaking rates that are both slower and faster than normal. DESIGN Participants (N = 145), ranging in age from 22 to 92, were tested in conditions with auditory-only, visual-only, and auditory-visual presentations using a closed-set speech perception test. Five different speaking rates were presented in each modality: an unmodified (normal rate), two rates that were slower than normal, and two rates that were faster than normal. Signal to noise ratios were set individually to produce approximately 30% correct identification in the auditory-only condition and this signal to noise ratio was used in the auditory-only and auditory-visual conditions. RESULTS Age × Rate interactions were observed for the fastest speaking rates in both the visual-only and auditory-visual conditions. Unimodal performance accounted for at least 60% of the variance in auditory-visual performance for all five speaking rates. CONCLUSIONS The findings demonstrate that the disproportionate difficulty that older adults have with rapid speech for auditory-only presentations can also be observed with visual-only and auditory-visual presentations. Taken together, the present analyses of age and individual differences indicate a generalized age-related decline in the ability to understand speech produced at fast speaking rates. The finding that auditory-visual speech performance was almost entirely predicted by unimodal performance across all five speaking rates has important clinical implications for auditory-visual speech perception and the ability of older adults to use visual speech information to compensate for age-related hearing loss.
Collapse
|
40
|
Schubotz L, Holler J, Drijvers L, Özyürek A. Aging and working memory modulate the ability to benefit from visible speech and iconic gestures during speech-in-noise comprehension. PSYCHOLOGICAL RESEARCH 2021; 85:1997-2011. [PMID: 32627053 PMCID: PMC8289811 DOI: 10.1007/s00426-020-01363-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 05/20/2020] [Indexed: 12/19/2022]
Abstract
When comprehending speech-in-noise (SiN), younger and older adults benefit from seeing the speaker's mouth, i.e. visible speech. Younger adults additionally benefit from manual iconic co-speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (speech-only), visible speech, or visible speech + iconic gesture. The speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (visible speech, visible speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to visible speech improves younger and older adults' comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.
Collapse
Affiliation(s)
- Louise Schubotz
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Judith Holler
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands.
| | - Linda Drijvers
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
| | - Aslı Özyürek
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
- Centre for Language Studies, Radboud University Nijmegen, P.O. Box 9103, 6500 HD, Nijmegen, The Netherlands
| |
Collapse
|
41
|
Begau A, Klatt LI, Wascher E, Schneider D, Getzmann S. Do congruent lip movements facilitate speech processing in a dynamic audiovisual multi-talker scenario? An ERP study with older and younger adults. Behav Brain Res 2021; 412:113436. [PMID: 34175355 DOI: 10.1016/j.bbr.2021.113436] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/26/2021] [Accepted: 06/21/2021] [Indexed: 11/26/2022]
Abstract
In natural conversations, visible mouth and lip movements play an important role in speech comprehension. There is evidence that visual speech information improves speech comprehension, especially for older adults and under difficult listening conditions. However, the neurocognitive basis is still poorly understood. The present EEG experiment investigated the benefits of audiovisual speech in a dynamic cocktail-party scenario with 22 (aged 20-34 years) younger and 20 (aged 55-74 years) older participants. We presented three simultaneously talking faces with a varying amount of visual speech input (still faces, visually unspecific and audiovisually congruent). In a two-alternative forced-choice task, participants had to discriminate target words ("yes" or "no") among two distractors (one-digit number words). In half of the experimental blocks, the target was always presented from a central position, in the other half, occasional switches to a lateral position could occur. We investigated behavioral and electrophysiological modulations due to age, location switches and the content of visual information, analyzing response times and accuracy as well as the P1, N1, P2, N2 event-related potentials (ERPs) and the contingent negative variation (CNV) in the EEG. We found that audiovisually congruent speech information improved performance and modulated ERP amplitudes in both age groups, suggesting enhanced preparation and integration of the subsequent auditory input. In the older group, larger amplitude measures were found in early phases of processing (P1-N1). Here, amplitude measures were reduced in response to audiovisually congruent stimuli. In later processing phases (P2-N2) we found decreased amplitude measures in the older group, while an amplitude reduction for audiovisually congruent compared to visually unspecific stimuli was still observable. However, these benefits were only observed as long as no location switches occurred, leading to enhanced amplitude measures in later processing phases (P2-N2). To conclude, meaningful visual information in a multi-talker setting, when presented from the expected location, is shown to be beneficial for both younger and older adults.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany.
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| |
Collapse
|
42
|
Zhong L, Noud BP, Pruitt H, Marcrum SC, Picou EM. Effects of text supplementation on speech intelligibility for listeners with normal and impaired hearing: a systematic review with implications for telecommunication. Int J Audiol 2021; 61:1-11. [PMID: 34154488 DOI: 10.1080/14992027.2021.1937346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE Telecommunication can be difficult in the presence of noise or hearing loss. The purpose of this study was to systematically review evidence regarding the effects of text supplementation (e.g. captions, subtitles) of auditory or auditory-visual signals on speech intelligibility for listeners with normal or impaired hearing. DESIGN Three databases were searched. Articles were evaluated for inclusion based on the Population Intervention Comparison Outcome framework. The Effective Public Health Practice Project instrument was used to evaluate the quality of the identified articles. STUDY SAMPLE After duplicates were removed, the titles and abstracts of 2019 articles were screened. Forty-six full texts were reviewed; ten met inclusion criteria. RESULTS The quality of all ten articles was moderate or strong. The articles demonstrated that text added to auditory (or auditory-visual) signals improved speech intelligibility and that the benefits were largest when auditory signal integrity was low, accuracy of the text was high, and the auditory signal and text were synchronous. Age and hearing loss did not affect benefits from the addition of text. CONCLUSIONS Although only based on ten studies, these data support the use of text as a supplement during telecommunication, such as while watching television or during telehealth appointments.
Collapse
Affiliation(s)
- Ling Zhong
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Brianne P Noud
- Department of Audiology, Center for Hearing and Speech, St. Louis, MO, USA
| | - Harriet Pruitt
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Speech-Language Pathology, Advanced Therapy Solutions, Clarksville, TN, USA
| | - Steven C Marcrum
- Department of Otolaryngology, University Hospital Regensburg, Regensburg, Germany
| | - Erin M Picou
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
43
|
Dias JW, McClaskey CM, Harris KC. Audiovisual speech is more than the sum of its parts: Auditory-visual superadditivity compensates for age-related declines in audible and lipread speech intelligibility. Psychol Aging 2021; 36:520-530. [PMID: 34124922 PMCID: PMC8427734 DOI: 10.1037/pag0000613] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Multisensory input can improve perception of ambiguous unisensory information. For example, speech heard in noise can be more accurately identified when listeners see a speaker's articulating face. Importantly, these multisensory effects can be superadditive to listeners' ability to process unisensory speech, such that audiovisual speech identification is better than the sum of auditory-only and visual-only speech identification. Age-related declines in auditory and visual speech perception have been hypothesized to be concomitant with stronger cross-sensory influences on audiovisual speech identification, but little evidence exists to support this. Currently, studies do not account for the multisensory superadditive benefit of auditory-visual input in their metrics of the auditory or visual influence on audiovisual speech perception. Here we treat multisensory superadditivity as independent from unisensory auditory and visual processing. In the current investigation, older and younger adults identified auditory, visual, and audiovisual speech in noisy listening conditions. Performance across these conditions was used to compute conventional metrics of the auditory and visual influence on audiovisual speech identification and a metric of auditory-visual superadditivity. Consistent with past work, auditory and visual speech identification declined with age, audiovisual speech identification was preserved, and no age-related differences in the auditory or visual influence on audiovisual speech identification were observed. However, we found that auditory-visual superadditivity improved with age. The novel findings suggest that multisensory superadditivity is independent of unisensory processing. As auditory and visual speech identification decline with age, compensatory changes in multisensory superadditivity may preserve audiovisual speech identification in older adults. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- James W Dias
- Department of Otolaryngology-Head and Neck Surgery
| | | | | |
Collapse
|
44
|
Chauvin A, Baum S, Phillips NA. Individuals With Mild Cognitive Impairment and Alzheimer's Disease Benefit From Audiovisual Speech Cues and Supportive Sentence Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1550-1559. [PMID: 33861623 DOI: 10.1044/2021_jslhr-20-00402] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose Speech perception in noise becomes difficult with age but can be facilitated by audiovisual (AV) speech cues and sentence context in healthy older adults. However, individuals with Alzheimer's disease (AD) may present with deficits in AV integration, potentially limiting the extent to which they can benefit from AV cues. This study investigated the benefit of these cues in individuals with mild cognitive impairment (MCI), individuals with AD, and healthy older adult controls. Method This study compared auditory-only and AV speech perception of sentences presented in noise. These sentences had one of two levels of context: high (e.g., "Stir your coffee with a spoon") and low (e.g., "Bob didn't think about the spoon"). Fourteen older controls (M age = 72.71 years, SD = 9.39), 13 individuals with MCI (M age = 79.92 years, SD = 5.52), and nine individuals with probable Alzheimer's-type dementia (M age = 79.38 years, SD = 3.40) completed the speech perception task and were asked to repeat the terminal word of each sentence. Results All three groups benefited (i.e., identified more terminal words) from AV and sentence context. Individuals with MCI showed a smaller AV benefit compared to controls in low-context conditions, suggesting difficulties with AV integration. Individuals with AD showed a smaller benefit in high-context conditions compared to controls, indicating difficulties with AV integration and context use in AD. Conclusions Individuals with MCI and individuals with AD do benefit from AV speech and semantic context during speech perception in noise (albeit to a lower extent than healthy older adults). This suggests that engaging in face-to-face communication and providing ample context will likely foster more effective communication between patients and caregivers, professionals, and loved ones.
Collapse
Affiliation(s)
- Alexandre Chauvin
- Department of Psychology/Centre for Research in Human Development, Concordia University, Montréal, Québec, Canada
- Centre for Research on Brain, Language and Music, McGill University, Montréal, Québec, Canada
| | - Shari Baum
- Centre for Research on Brain, Language and Music, McGill University, Montréal, Québec, Canada
- School of Communication Sciences and Disorders, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
| | - Natalie A Phillips
- Department of Psychology/Centre for Research in Human Development, Concordia University, Montréal, Québec, Canada
- Centre for Research on Brain, Language and Music, McGill University, Montréal, Québec, Canada
- Bloomfield Centre for Research in Aging, Lady Davis Institute for Medical Research, Montréal, Québec, Canada
| |
Collapse
|
45
|
Li Y, Li Z, Deng A, Zheng H, Chen J, Ren Y, Yang W. The Modulation of Exogenous Attention on Emotional Audiovisual Integration. Iperception 2021; 12:20416695211018714. [PMID: 34104384 PMCID: PMC8167015 DOI: 10.1177/20416695211018714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 04/29/2021] [Indexed: 11/15/2022] Open
Abstract
Although emotional audiovisual integration has been investigated previously, whether emotional audiovisual integration is affected by the spatial allocation of visual attention is currently unknown. To examine this question, a variant of the exogenous spatial cueing paradigm was adopted, in which stimuli varying by facial expressions and nonverbal affective prosody were used to express six basic emotions (happiness, anger, disgust, sadness, fear, surprise) via a visual, an auditory, or an audiovisual modality. The emotional stimuli were preceded by an unpredictive cue that was used to attract participants' visual attention. The results showed significantly higher accuracy and quicker response times in response to bimodal audiovisual stimuli than to unimodal visual or auditory stimuli for emotional perception under both valid and invalid cue conditions. The auditory facilitation effect was stronger than the visual facilitation effect under exogenous attention for the six emotions tested. Larger auditory enhancement was induced when the target was presented at the expected location than at the unexpected location. For emotional perception, happiness shared the biggest auditory enhancement among all six emotions. However, the influence of exogenous cueing effect on emotional perception seemed to be absent.
Collapse
Affiliation(s)
- Yueying Li
- Department of Psychology, Faculty of Education, Hubei University, Wuhan, China; Graduate School of Humanities, Kobe University, Japan
| | | | | | | | - Jianxin Chen
- Department of Psychology, Faculty of Education, Hubei University, Wuhan, China
| | - Yanna Ren
- Department of Psychology, Medical Humanities College, Guiyang College of Traditional Chinese Medicine, Guiyang, China
| | - Weiping Yang
- Department of Psychology, Faculty of Education, Hubei University, Wuhan, China; Brain and Cognition Research Center (BCRC), Faculty of Education, Hubei University, Wuhan, China
| |
Collapse
|
46
|
Errors on a Speech-in-Babble Sentence Recognition Test Reveal Individual Differences in Acoustic Phonetic Perception and Babble Misallocations. Ear Hear 2021; 42:673-690. [PMID: 33928926 DOI: 10.1097/aud.0000000000001020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
OBJECTIVES The ability to recognize words in connected speech under noisy listening conditions is critical to everyday communication. Many processing levels contribute to the individual listener's ability to recognize words correctly against background speech, and there is clinical need for measures of individual differences at different levels. Typical listening tests of speech recognition in noise require a list of items to obtain a single threshold score. Diverse abilities measures could be obtained through mining various open-set recognition errors during multi-item tests. This study sought to demonstrate that an error mining approach using open-set responses from a clinical sentence-in-babble-noise test can be used to characterize abilities beyond signal-to-noise ratio (SNR) threshold. A stimulus-response phoneme-to-phoneme sequence alignment software system was used to achieve automatic, accurate quantitative error scores. The method was applied to a database of responses from normal-hearing (NH) adults. Relationships between two types of response errors and words correct scores were evaluated through use of mixed models regression. DESIGN Two hundred thirty-three NH adults completed three lists of the Quick Speech in Noise test. Their individual open-set speech recognition responses were automatically phonemically transcribed and submitted to a phoneme-to-phoneme stimulus-response sequence alignment system. The computed alignments were mined for a measure of acoustic phonetic perception, a measure of response text that could not be attributed to the stimulus, and a count of words correct. The mined data were statistically analyzed to determine whether the response errors were significant factors beyond stimulus SNR in accounting for the number of words correct per response from each participant. This study addressed two hypotheses: (1) Individuals whose perceptual errors are less severe recognize more words correctly under difficult listening conditions due to babble masking and (2) Listeners who are better able to exclude incorrect speech information such as from background babble and filling in recognize more stimulus words correctly. RESULTS Statistical analyses showed that acoustic phonetic accuracy and exclusion of babble background were significant factors, beyond the stimulus sentence SNR, in accounting for the number of words a participant recognized. There was also evidence that poorer acoustic phonetic accuracy could occur along with higher words correct scores. This paradoxical result came from a subset of listeners who had also performed subjective accuracy judgments. Their results suggested that they recognized more words while also misallocating acoustic cues from the background into the stimulus, without realizing their errors. Because the Quick Speech in Noise test stimuli are locked to their own babble sample, misallocations of whole words from babble into the responses could be investigated in detail. The high rate of common misallocation errors for some sentences supported the view that the functional stimulus was the combination of the target sentence and its babble. CONCLUSIONS Individual differences among NH listeners arise both in terms of words accurately identified and errors committed during open-set recognition of sentences in babble maskers. Error mining to characterize individual listeners can be done automatically at the levels of acoustic phonetic perception and the misallocation of background babble words into open-set responses. Error mining can increase test information and the efficiency and accuracy of characterizing individual listeners.
Collapse
|
47
|
Zendel BR, Power BV, DiDonato RM, Hutchings VMM. Memory Deficits for Health Information Provided Through a Telehealth Video Conferencing System. Front Psychol 2021; 12:604074. [PMID: 33841239 PMCID: PMC8024525 DOI: 10.3389/fpsyg.2021.604074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 02/26/2021] [Indexed: 11/13/2022] Open
Abstract
It is critical to remember details about meetings with healthcare providers. Forgetting could result in inadequate knowledge about ones' health, non-adherence with treatments, and poorer health outcomes. Hearing the health care provider plays a crucial role in consolidating information for recall. The recent COVID-19 pandemic has meant a rapid transition to videoconference-based medicine, here described as telehealth. When using telehealth speech must be filtered and compressed, and research has shown that degraded speech is more challenging to remember. Here we present preliminary results from a study that compared memory for health information provided in-person to telehealth. The data collection for this study was stopped due to the pandemic, but the preliminary results are interesting because the pandemic forced a rapid transition to telehealth. To examine a potential memory deficit for health information provided through telehealth, we presented older and younger adults with instructions on how to use two medical devices. One set of instructions was presented in-person, and the other through telehealth. Participants were asked to recall the instructions immediately after the session, and again after a 1-week delay. Overall, the number of details recalled was significantly lower when instructions were provided by telehealth, both immediately after the session and after a 1-week delay. It is likely that a mix of technological and communication strategies by the healthcare provider could reduce this telehealth memory deficit. Given the rapid transition to telehealth due to COVID-19, highlighting this deficit and providing potential solutions are timely and of utmost importance.
Collapse
Affiliation(s)
- Benjamin Rich Zendel
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada.,Aging Research Centre-Newfoundland and Labrador, Memorial University, Corner Brook, NL, Canada
| | | | - Roberta Maria DiDonato
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada.,Aging Research Centre-Newfoundland and Labrador, Memorial University, Corner Brook, NL, Canada
| | - Veronica Margaret Moore Hutchings
- Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada.,Aging Research Centre-Newfoundland and Labrador, Memorial University, Corner Brook, NL, Canada
| |
Collapse
|
48
|
Bean NL, Stein BE, Rowland BA. Stimulus value gates multisensory integration. Eur J Neurosci 2021; 53:3142-3159. [PMID: 33667027 DOI: 10.1111/ejn.15167] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 02/18/2021] [Accepted: 02/22/2021] [Indexed: 11/28/2022]
Abstract
The brain enhances its perceptual and behavioral decisions by integrating information from its multiple senses in what are believed to be optimal ways. This phenomenon of "multisensory integration" appears to be pre-conscious, effortless, and highly efficient. The present experiments examined whether experience could modify this seemingly automatic process. Cats were trained in a localization task in which congruent pairs of auditory-visual stimuli are normally integrated to enhance detection and orientation/approach performance. Consistent with the results of previous studies, animals more reliably detected and approached cross-modal pairs than their modality-specific component stimuli, regardless of whether the pairings were novel or familiar. However, when provided evidence that one of the modality-specific component stimuli had no value (it was not rewarded) animals ceased integrating it with other cues, and it lost its previous ability to enhance approach behaviors. Cross-modal pairings involving that stimulus failed to elicit enhanced responses even when the paired stimuli were congruent and mutually informative. However, the stimulus regained its ability to enhance responses when it was associated with reward. This suggests that experience can selectively block access of stimuli (i.e., filter inputs) to the multisensory computation. Because this filtering process results in the loss of useful information, its operation and behavioral consequences are not optimal. Nevertheless, the process can be of substantial value in natural environments, rich in dynamic stimuli, by using experience to minimize the impact of stimuli unlikely to be of biological significance, and reducing the complexity of the problem of matching signals across the senses.
Collapse
Affiliation(s)
- Naomi L Bean
- Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Barry E Stein
- Wake Forest School of Medicine, Winston-Salem, NC, USA
| | | |
Collapse
|
49
|
Jones SA, Noppeney U. Ageing and multisensory integration: A review of the evidence, and a computational perspective. Cortex 2021; 138:1-23. [PMID: 33676086 DOI: 10.1016/j.cortex.2021.02.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 01/23/2021] [Accepted: 02/02/2021] [Indexed: 11/29/2022]
Abstract
The processing of multisensory signals is crucial for effective interaction with the environment, but our ability to perform this vital function changes as we age. In the first part of this review, we summarise existing research into the effects of healthy ageing on multisensory integration. We note that age differences vary substantially with the paradigms and stimuli used: older adults often receive at least as much benefit (to both accuracy and response times) as younger controls from congruent multisensory stimuli, but are also consistently more negatively impacted by the presence of intersensory conflict. In the second part, we outline a normative Bayesian framework that provides a principled and computationally informed perspective on the key ingredients involved in multisensory perception, and how these are affected by ageing. Applying this framework to the existing literature, we conclude that changes to sensory reliability, prior expectations (together with attentional control), and decisional strategies all contribute to the age differences observed. However, we find no compelling evidence of any age-related changes to the basic inference mechanisms involved in multisensory perception.
Collapse
Affiliation(s)
- Samuel A Jones
- The Staffordshire Centre for Psychological Research, Staffordshire University, Stoke-on-Trent, UK.
| | - Uta Noppeney
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands.
| |
Collapse
|
50
|
Effects of stimulus intensity on audiovisual integration in aging across the temporal dynamics of processing. Int J Psychophysiol 2021; 162:95-103. [PMID: 33529642 DOI: 10.1016/j.ijpsycho.2021.01.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 10/26/2020] [Accepted: 01/24/2021] [Indexed: 11/24/2022]
Abstract
Previous studies have drawn different conclusions about whether older adults benefit more from audiovisual integration, and such conflicts may have been due to the stimulus features investigated in those studies, such as stimulus intensity. In the current study, using ERPs, we compared the effects of stimulus intensity on audiovisual integration between young adults and older adults. The results showed that inverse effectiveness, which depicts a phenomenon that lowing the effectiveness of sensory stimuli increases benefits of multisensory integration, was observed in young adults at earlier processing stages but was absent in older adults. Moreover, at the earlier processing stages (60-90 ms and 110-140 ms), older adults exhibited significantly greater audiovisual integration than young adults (all ps < 0.05). However, at the later processing stages (220-250 ms and 340-370 ms), young adults exhibited significantly greater audiovisual integration than old adults (all ps < 0.001). The results suggested that there is an age-related dissociation between early integration and late integration, which indicates that there are different audiovisual processing mechanisms in play between older adults and young adults.
Collapse
|