1
|
Sato M. Audiovisual speech asynchrony asymmetrically modulates neural binding. Neuropsychologia 2024; 198:108866. [PMID: 38518889 DOI: 10.1016/j.neuropsychologia.2024.108866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/09/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024]
Abstract
Previous psychophysical and neurophysiological studies in young healthy adults have provided evidence that audiovisual speech integration occurs with a large degree of temporal tolerance around true simultaneity. To further determine whether audiovisual speech asynchrony modulates auditory cortical processing and neural binding in young healthy adults, N1/P2 auditory evoked responses were compared using an additive model during a syllable categorization task, without or with an audiovisual asynchrony ranging from 240 ms visual lead to 240 ms auditory lead. Consistent with previous psychophysical findings, the observed results converge in favor of an asymmetric temporal integration window. Three main findings were observed: 1) predictive temporal and phonetic cues from pre-phonatory visual movements before the acoustic onset appeared essential for neural binding to occur, 2) audiovisual synchrony, with visual pre-phonatory movements predictive of the onset of the acoustic signal, was a prerequisite for N1 latency facilitation, and 3) P2 amplitude suppression and latency facilitation occurred even when visual pre-phonatory movements were not predictive of the acoustic onset but the syllable to come. Taken together, these findings help further clarify how audiovisual speech integration partly operates through two stages of visually-based temporal and phonetic predictions.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
2
|
Zou T, Li L, Huang X, Deng C, Wang X, Gao Q, Chen H, Li R. Dynamic causal modeling analysis reveals the modulation of motor cortex and integration in superior temporal gyrus during multisensory speech perception. Cogn Neurodyn 2024; 18:931-946. [PMID: 38826672 PMCID: PMC11143173 DOI: 10.1007/s11571-023-09945-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 02/03/2023] [Accepted: 02/10/2023] [Indexed: 03/06/2023] Open
Abstract
The processing of speech information from various sensory modalities is crucial for human communication. Both left posterior superior temporal gyrus (pSTG) and motor cortex importantly involve in the multisensory speech perception. However, the dynamic integration of primary sensory regions to pSTG and the motor cortex remain unclear. Here, we implemented a behavioral experiment of classical McGurk effect paradigm and acquired the task functional magnetic resonance imaging (fMRI) data during synchronized audiovisual syllabic perception from 63 normal adults. We conducted dynamic causal modeling (DCM) analysis to explore the cross-modal interactions among the left pSTG, left precentral gyrus (PrG), left middle superior temporal gyrus (mSTG), and left fusiform gyrus (FuG). Bayesian model selection favored a winning model that included modulations of connections to PrG (mSTG → PrG, FuG → PrG), from PrG (PrG → mSTG, PrG → FuG), and to pSTG (mSTG → pSTG, FuG → pSTG). Moreover, the coupling strength of the above connections correlated with behavioral McGurk susceptibility. In addition, significant differences were found in the coupling strength of these connections between strong and weak McGurk perceivers. Strong perceivers modulated less inhibitory visual influence, allowed less excitatory auditory information flowing into PrG, but integrated more audiovisual information in pSTG. Taken together, our findings show that the PrG and pSTG interact dynamically with primary cortices during audiovisual speech, and support the motor cortex plays a specifically functional role in modulating the gain and salience between auditory and visual modalities. Supplementary Information The online version contains supplementary material available at 10.1007/s11571-023-09945-z.
Collapse
Affiliation(s)
- Ting Zou
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Liyuan Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Xinju Huang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Chijun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Xuyang Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Qing Gao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Huafu Chen
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Rong Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| |
Collapse
|
3
|
Kausel L, Michon M, Soto-Icaza P, Aboitiz F. A multimodal interface for speech perception: the role of the left superior temporal sulcus in social cognition and autism. Cereb Cortex 2024; 34:84-93. [PMID: 38696598 DOI: 10.1093/cercor/bhae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 02/03/2024] [Indexed: 05/04/2024] Open
Abstract
Multimodal integration is crucial for human interaction, in particular for social communication, which relies on integrating information from various sensory modalities. Recently a third visual pathway specialized in social perception was proposed, which includes the right superior temporal sulcus (STS) playing a key role in processing socially relevant cues and high-level social perception. Importantly, it has also recently been proposed that the left STS contributes to audiovisual integration of speech processing. In this article, we propose that brain areas along the right STS that support multimodal integration for social perception and cognition can be considered homologs to those in the left, language-dominant hemisphere, sustaining multimodal integration of speech and semantic concepts fundamental for social communication. Emphasizing the significance of the left STS in multimodal integration and associated processes such as multimodal attention to socially relevant stimuli, we underscore its potential relevance in comprehending neurodevelopmental conditions characterized by challenges in social communication such as autism spectrum disorder (ASD). Further research into this left lateral processing stream holds the promise of enhancing our understanding of social communication in both typical development and ASD, which may lead to more effective interventions that could improve the quality of life for individuals with atypical neurodevelopment.
Collapse
Affiliation(s)
- Leonie Kausel
- Centro de Estudios en Neurociencia Humana y Neuropsicología (CENHN), Facultad de Psicología, Universidad Diego Portales, Chile, Vergara 275, 8370076 Santiago, Chile
| | - Maëva Michon
- Praxiling Laboratory, Joint Research Unit (UMR 5267), Centre National de la Recherche Scientifique (CNRS), Université Paul Valéry, Montpellier, France, Route de Mende, 34199 Montpellier cedex 5, France
- Centro Interdisciplinario de Neurociencia, Pontificia Universidad Católica de Chile, Chile, Marcoleta 391, 2do piso, 8330024 Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Facultad de Medicina, Pontificia Universidad Católica de Chile, Chile, Marcoleta 391, 2do piso, 8330024 Santiago, Chile
| | - Patricia Soto-Icaza
- Centro de Investigación en Complejidad Social (CICS), Facultad de Gobierno, Universidad del Desarrollo, Chile, Av. Las Condes 12461, edificio 3, piso 3, 7590943, Las Condes Santiago, Chile
| | - Francisco Aboitiz
- Centro Interdisciplinario de Neurociencia, Pontificia Universidad Católica de Chile, Chile, Marcoleta 391, 2do piso, 8330024 Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Facultad de Medicina, Pontificia Universidad Católica de Chile, Chile, Marcoleta 391, 2do piso, 8330024 Santiago, Chile
| |
Collapse
|
4
|
Sato M. Competing influence of visual speech on auditory neural adaptation. BRAIN AND LANGUAGE 2023; 247:105359. [PMID: 37951157 DOI: 10.1016/j.bandl.2023.105359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/25/2023] [Accepted: 11/06/2023] [Indexed: 11/13/2023]
Abstract
Visual information from a speaker's face enhances auditory neural processing and speech recognition. To determine whether auditory memory can be influenced by visual speech, the degree of auditory neural adaptation of an auditory syllable preceded by an auditory, visual, or audiovisual syllable was examined using EEG. Consistent with previous findings and additional adaptation of auditory neurons tuned to acoustic features, stronger adaptation of N1, P2 and N2 auditory evoked responses was observed when the auditory syllable was preceded by an auditory compared to a visual syllable. However, although stronger than when preceded by a visual syllable, lower adaptation was observed when the auditory syllable was preceded by an audiovisual compared to an auditory syllable. In addition, longer N1 and P2 latencies were then observed. These results further demonstrate that visual speech acts on auditory memory but suggest competing visual influences in the case of audiovisual stimulation.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, UMR 7309 CNRS & Aix-Marseille Université, Aix-Marseille Université, 5 avenue Pasteur, Aix-en-Provence, France.
| |
Collapse
|
5
|
Liu Y, Wang Z, Wei T, Zhou S, Yin Y, Mi Y, Liu X, Tang Y. Alterations of Audiovisual Integration in Alzheimer's Disease. Neurosci Bull 2023; 39:1859-1872. [PMID: 37812301 PMCID: PMC10661680 DOI: 10.1007/s12264-023-01125-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 06/22/2023] [Indexed: 10/10/2023] Open
Abstract
Audiovisual integration is a vital information process involved in cognition and is closely correlated with aging and Alzheimer's disease (AD). In this review, we evaluated the altered audiovisual integrative behavioral symptoms in AD. We further analyzed the relationships between AD pathologies and audiovisual integration alterations bidirectionally and suggested the possible mechanisms of audiovisual integration alterations underlying AD, including the imbalance between energy demand and supply, activity-dependent degeneration, disrupted brain networks, and cognitive resource overloading. Then, based on the clinical characteristics including electrophysiological and imaging data related to audiovisual integration, we emphasized the value of audiovisual integration alterations as potential biomarkers for the early diagnosis and progression of AD. We also highlighted that treatments targeted audiovisual integration contributed to widespread pathological improvements in AD animal models and cognitive improvements in AD patients. Moreover, investigation into audiovisual integration alterations in AD also provided new insights and comprehension about sensory information processes.
Collapse
Affiliation(s)
- Yufei Liu
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Zhibin Wang
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Tao Wei
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Shaojiong Zhou
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yunsi Yin
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yingxin Mi
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Xiaoduo Liu
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yi Tang
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China.
| |
Collapse
|
6
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
7
|
Jeschke L, Mathias B, von Kriegstein K. Inhibitory TMS over Visual Area V5/MT Disrupts Visual Speech Recognition. J Neurosci 2023; 43:7690-7699. [PMID: 37848284 PMCID: PMC10634547 DOI: 10.1523/jneurosci.0975-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/26/2023] [Accepted: 09/04/2023] [Indexed: 10/19/2023] Open
Abstract
During face-to-face communication, the perception and recognition of facial movements can facilitate individuals' understanding of what is said. Facial movements are a form of complex biological motion. Separate neural pathways are thought to processing (1) simple, nonbiological motion with an obligatory waypoint in the motion-sensitive visual middle temporal area (V5/MT); and (2) complex biological motion. Here, we present findings that challenge this dichotomy. Neuronavigated offline transcranial magnetic stimulation (TMS) over V5/MT on 24 participants (17 females and 7 males) led to increased response times in the recognition of simple, nonbiological motion as well as visual speech recognition compared with TMS over the vertex, an active control region. TMS of area V5/MT also reduced practice effects on response times, that are typically observed in both visual speech and motion recognition tasks over time. Our findings provide the first indication that area V5/MT causally influences the recognition of visual speech.SIGNIFICANCE STATEMENT In everyday face-to-face communication, speech comprehension is often facilitated by viewing a speaker's facial movements. Several brain areas contribute to the recognition of visual speech. One area of interest is the motion-sensitive visual medial temporal area (V5/MT), which has been associated with the perception of simple, nonbiological motion such as moving dots, as well as more complex, biological motion such as visual speech. Here, we demonstrate using noninvasive brain stimulation that area V5/MT is causally relevant in recognizing visual speech. This finding provides new insights into the neural mechanisms that support the perception of human communication signals, which will help guide future research in typically developed individuals and populations with communication difficulties.
Collapse
Affiliation(s)
- Lisa Jeschke
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Brian Mathias
- School of Psychology, University of Aberdeen, Aberdeen AB243FX, United Kingdom
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| |
Collapse
|
8
|
Zhang Y, Rennig J, Magnotti JF, Beauchamp MS. Multivariate fMRI responses in superior temporal cortex predict visual contributions to, and individual differences in, the intelligibility of noisy speech. Neuroimage 2023; 278:120271. [PMID: 37442310 PMCID: PMC10460966 DOI: 10.1016/j.neuroimage.2023.120271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/20/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Humans have the unique ability to decode the rapid stream of language elements that constitute speech, even when it is contaminated by noise. Two reliable observations about noisy speech perception are that seeing the face of the talker improves intelligibility and the existence of individual differences in the ability to perceive noisy speech. We introduce a multivariate BOLD fMRI measure that explains both observations. In two independent fMRI studies, clear and noisy speech was presented in visual, auditory and audiovisual formats to thirty-seven participants who rated intelligibility. An event-related design was used to sort noisy speech trials by their intelligibility. Individual-differences multidimensional scaling was applied to fMRI response patterns in superior temporal cortex and the dissimilarity between responses to clear speech and noisy (but intelligible) speech was measured. Neural dissimilarity was less for audiovisual speech than auditory-only speech, corresponding to the greater intelligibility of noisy audiovisual speech. Dissimilarity was less in participants with better noisy speech perception, corresponding to individual differences. These relationships held for both single word and entire sentence stimuli, suggesting that they were driven by intelligibility rather than the specific stimuli tested. A neural measure of perceptual intelligibility may aid in the development of strategies for helping those with impaired speech perception.
Collapse
Affiliation(s)
- Yue Zhang
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| | - Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| |
Collapse
|
9
|
Banks MI, Krause BM, Berger DG, Campbell DI, Boes AD, Bruss JE, Kovach CK, Kawasaki H, Steinschneider M, Nourski KV. Functional geometry of auditory cortical resting state networks derived from intracranial electrophysiology. PLoS Biol 2023; 21:e3002239. [PMID: 37651504 PMCID: PMC10499207 DOI: 10.1371/journal.pbio.3002239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 09/13/2023] [Accepted: 07/07/2023] [Indexed: 09/02/2023] Open
Abstract
Understanding central auditory processing critically depends on defining underlying auditory cortical networks and their relationship to the rest of the brain. We addressed these questions using resting state functional connectivity derived from human intracranial electroencephalography. Mapping recording sites into a low-dimensional space where proximity represents functional similarity revealed a hierarchical organization. At a fine scale, a group of auditory cortical regions excluded several higher-order auditory areas and segregated maximally from the prefrontal cortex. On mesoscale, the proximity of limbic structures to the auditory cortex suggested a limbic stream that parallels the classically described ventral and dorsal auditory processing streams. Identities of global hubs in anterior temporal and cingulate cortex depended on frequency band, consistent with diverse roles in semantic and cognitive processing. On a macroscale, observed hemispheric asymmetries were not specific for speech and language networks. This approach can be applied to multivariate brain data with respect to development, behavior, and disorders.
Collapse
Affiliation(s)
- Matthew I. Banks
- Department of Anesthesiology, University of Wisconsin, Madison, Wisconsin, United States of America
- Department of Neuroscience, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Bryan M. Krause
- Department of Anesthesiology, University of Wisconsin, Madison, Wisconsin, United States of America
| | - D. Graham Berger
- Department of Anesthesiology, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Declan I. Campbell
- Department of Anesthesiology, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Aaron D. Boes
- Department of Neurology, The University of Iowa, Iowa City, Iowa, United States of America
| | - Joel E. Bruss
- Department of Neurology, The University of Iowa, Iowa City, Iowa, United States of America
| | - Christopher K. Kovach
- Department of Neurosurgery, The University of Iowa, Iowa City, Iowa, United States of America
| | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, Iowa, United States of America
| | - Mitchell Steinschneider
- Department of Neurology, Albert Einstein College of Medicine, New York, New York, United States of America
- Department of Neuroscience, Albert Einstein College of Medicine, New York, New York, United States of America
| | - Kirill V. Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, Iowa, United States of America
- Iowa Neuroscience Institute, The University of Iowa, Iowa City, Iowa, United States of America
| |
Collapse
|
10
|
Bernstein LE, Auer ET, Eberhardt SP. Modality-Specific Perceptual Learning of Vocoded Auditory versus Lipread Speech: Different Effects of Prior Information. Brain Sci 2023; 13:1008. [PMID: 37508940 PMCID: PMC10377548 DOI: 10.3390/brainsci13071008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/26/2023] [Accepted: 06/08/2023] [Indexed: 07/30/2023] Open
Abstract
Traditionally, speech perception training paradigms have not adequately taken into account the possibility that there may be modality-specific requirements for perceptual learning with auditory-only (AO) versus visual-only (VO) speech stimuli. The study reported here investigated the hypothesis that there are modality-specific differences in how prior information is used by normal-hearing participants during vocoded versus VO speech training. Two different experiments, one with vocoded AO speech (Experiment 1) and one with VO, lipread, speech (Experiment 2), investigated the effects of giving different types of prior information to trainees on each trial during training. The training was for four ~20 min sessions, during which participants learned to label novel visual images using novel spoken words. Participants were assigned to different types of prior information during training: Word Group trainees saw a printed version of each training word (e.g., "tethon"), and Consonant Group trainees saw only its consonants (e.g., "t_th_n"). Additional groups received no prior information (i.e., Experiment 1, AO Group; Experiment 2, VO Group) or a spoken version of the stimulus in a different modality from the training stimuli (Experiment 1, Lipread Group; Experiment 2, Vocoder Group). That is, in each experiment, there was a group that received prior information in the modality of the training stimuli from the other experiment. In both experiments, the Word Groups had difficulty retaining the novel words they attempted to learn during training. However, when the training stimuli were vocoded, the Word Group improved their phoneme identification. When the training stimuli were visual speech, the Consonant Group improved their phoneme identification and their open-set sentence lipreading. The results are considered in light of theoretical accounts of perceptual learning in relationship to perceptual modality.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| | - Edward T Auer
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| | - Silvio P Eberhardt
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| |
Collapse
|
11
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
12
|
Evidence of visual crossmodal reorganization positively relates to speech outcomes in cochlear implant users. Sci Rep 2022; 12:17749. [PMID: 36273017 PMCID: PMC9587996 DOI: 10.1038/s41598-022-22117-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 10/10/2022] [Indexed: 01/18/2023] Open
Abstract
Deaf individuals who use a cochlear implant (CI) have remarkably different outcomes for auditory speech communication ability. One factor assumed to affect CI outcomes is visual crossmodal plasticity in auditory cortex, where deprived auditory regions begin to support non-auditory functions such as vision. Previous research has viewed crossmodal plasticity as harmful for speech outcomes for CI users if it interferes with sound processing, while others have demonstrated that plasticity related to visual language may be beneficial for speech recovery. To clarify, we used electroencephalography (EEG) to measure brain responses to a partial face speaking a silent single-syllable word (visual language) in 15 CI users and 13 age-matched typical-hearing controls. We used source analysis on EEG activity to measure crossmodal visual responses in auditory cortex and then compared them to CI users' speech-in-noise listening ability. CI users' brain response to the onset of the video stimulus (face) was larger than controls in left auditory cortex, consistent with crossmodal activation after deafness. CI users also produced a mixture of alpha (8-12 Hz) synchronization and desynchronization in auditory cortex while watching lip movement while controls instead showed desynchronization. CI users with higher speech scores had stronger crossmodal responses in auditory cortex to the onset of the video, but those with lower speech scores had increases in alpha power during lip movement in auditory areas. Therefore, evidence of crossmodal reorganization in CI users does not necessarily predict poor speech outcomes, and differences in crossmodal activation during lip reading may instead relate to strategies or differences that CI users use in audiovisual speech communication.
Collapse
|
13
|
Suess N, Hauswald A, Reisinger P, Rösch S, Keitel A, Weisz N. Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age. Cereb Cortex 2022; 32:4818-4833. [PMID: 35062025 PMCID: PMC9627034 DOI: 10.1093/cercor/bhab518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 12/15/2021] [Accepted: 12/16/2021] [Indexed: 11/26/2022] Open
Abstract
The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.
Collapse
Affiliation(s)
- Nina Suess
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Anne Hauswald
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Patrick Reisinger
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University Salzburg, University Hospital Salzburg, Salzburg 5020, Austria
| | - Anne Keitel
- School of Social Sciences, University of Dundee, Dundee DD1 4HN, UK
| | - Nathan Weisz
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
- Department of Psychology, Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg 5020, Austria
| |
Collapse
|
14
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
15
|
Fullerton AM, Vickers DA, Luke R, Billing AN, McAlpine D, Hernandez-Perez H, Peelle JE, Monaghan JJM, McMahon CM. Cross-modal functional connectivity supports speech understanding in cochlear implant users. Cereb Cortex 2022; 33:3350-3371. [PMID: 35989307 PMCID: PMC10068270 DOI: 10.1093/cercor/bhac277] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 06/10/2022] [Accepted: 06/11/2022] [Indexed: 11/12/2022] Open
Abstract
Sensory deprivation can lead to cross-modal cortical changes, whereby sensory brain regions deprived of input may be recruited to perform atypical function. Enhanced cross-modal responses to visual stimuli observed in auditory cortex of postlingually deaf cochlear implant (CI) users are hypothesized to reflect increased activation of cortical language regions, but it is unclear if this cross-modal activity is "adaptive" or "mal-adaptive" for speech understanding. To determine if increased activation of language regions is correlated with better speech understanding in CI users, we assessed task-related activation and functional connectivity of auditory and visual cortices to auditory and visual speech and non-speech stimuli in CI users (n = 14) and normal-hearing listeners (n = 17) and used functional near-infrared spectroscopy to measure hemodynamic responses. We used visually presented speech and non-speech to investigate neural processes related to linguistic content and observed that CI users show beneficial cross-modal effects. Specifically, an increase in connectivity between the left auditory and visual cortices-presumed primary sites of cortical language processing-was positively correlated with CI users' abilities to understand speech in background noise. Cross-modal activity in auditory cortex of postlingually deaf CI users may reflect adaptive activity of a distributed, multimodal speech network, recruited to enhance speech understanding.
Collapse
Affiliation(s)
- Amanda M Fullerton
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Deborah A Vickers
- Cambridge Hearing Group, Sound Lab, Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 OSZ, United Kingdom.,Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom
| | - Robert Luke
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Addison N Billing
- Institute of Cognitive Neuroscience, University College London, London WCIN 3AZ, United Kingdom.,DOT-HUB, Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, United Kingdom
| | - David McAlpine
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Heivet Hernandez-Perez
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO 63110, United States
| | - Jessica J M Monaghan
- National Acoustic Laboratories, Australian Hearing Hub, Sydney 2109, Australia.,Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Catherine M McMahon
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia.,HEAR Centre, Macquarie University, Sydney 2109, Australia
| |
Collapse
|
16
|
Zhang WJ, Li DN, Lian TH, Guo P, Zhang YN, Li JH, Guan HY, He MY, Zhang WJ, Zhang WJ, Luo DM, Wang XM, Zhang W. Clinical Features and Potential Mechanisms Relating Neuropathological Biomarkers and Blood-Brain Barrier in Patients With Alzheimer’s Disease and Hearing Loss. Front Aging Neurosci 2022; 14:911028. [PMID: 35783139 PMCID: PMC9245454 DOI: 10.3389/fnagi.2022.911028] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/05/2022] [Indexed: 11/24/2022] Open
Abstract
Background The aim of this study was to explore clinical features and potential mechanisms relating neuropathological biomarkers and blood-brain barrier (BBB) in Alzheimer’s disease (AD) and hearing loss (HL). Materials and Methods A total of 65 patients with AD were recruited and auditory function was assessed by threshold of pure tone audiometry (PTA). Patients were divided into AD with HL (AD-HL) and AD with no HL (AD-nHL) groups based on the standard of World Health Organization. Clinical symptoms were assessed by multiple rating scales. The levels of neuropathological biomarkers of β amyloid1-42 (Aβ1–42) and multiple phosphorylated tau (P-tau), and BBB factors of matrix metalloproteinases (MMPs), receptor of advanced glycation end products, glial fibrillary acidic protein, and low-density lipoprotein receptor related protein 1 were measured. Results (1) Compared with AD-nHL group, AD-HL group had significantly impaired overall cognitive function and cognitive domains of memory, language, attention, execution, and activities of daily living (ADL) reflected by the scores of rating scales (P < 0.05). PTA threshold was significantly correlated with the impairments of overall cognitive function and cognitive domains of memory and language, and ADL in patients with AD (P < 0.05). (2) P-tau (S199) level was significantly increased in CSF from AD-HL group (P < 0.05), and was significantly and positively correlated with PTA threshold in patients with AD. (3) MMP-3 level was significantly elevated in CSF from AD-HL group (P < 0.05), and was significantly and positively correlated with PTA threshold in patients with AD (P < 0.05). (4) In AD-HL group, P-tau (S199) level was significantly and positively correlated with the levels of MMP-2 and MMP-3 in CSF (P < 0.05). Conclusion AD-HL patients have severely compromised overall cognitive function, multiple cognitive domains, and ADL. The potential mechanisms of AD-HL involve elevations of AD neuropathological biomarker of P-tau (S199) and BBB factor of MMP-3, and close correlations between P-tau (S199) and MMP-2/MMP-3 in CSF. Findings from this investigation highly suggest significance of early evaluation of HL for delaying AD progression, and indicate new directions of drug development by inhibiting neuropathological biomarkers of AD and protecting BBB.
Collapse
Affiliation(s)
- Wei-jiao Zhang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Dan-ning Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Teng-hong Lian
- Center for Cognitive Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Peng Guo
- Center for Cognitive Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Ya-nan Zhang
- Department of Blood Transfusion, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Jing-hui Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Hui-ying Guan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Ming-yue He
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Wen-jing Zhang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Wei-jia Zhang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Dong-mei Luo
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Xiao-min Wang
- Department of Physiology, Capital Medical University, Beijing, China
| | - Wei Zhang
- Center for Cognitive Neurology, Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- Center of Parkinson’s Disease, Beijing Institute for Brain Disorders, Beijing, China
- Beijing Key Laboratory on Parkinson’s Disease, Beijing, China
- *Correspondence: Wei Zhang,
| |
Collapse
|
17
|
Johansson C, Folgerø PO. Is Reduced Visual Processing the Price of Language? Brain Sci 2022; 12:brainsci12060771. [PMID: 35741656 PMCID: PMC9221435 DOI: 10.3390/brainsci12060771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/06/2022] [Accepted: 06/08/2022] [Indexed: 02/01/2023] Open
Abstract
We suggest a later timeline for full language capabilities in Homo sapiens, placing the emergence of language over 200,000 years after the emergence of our species. The late Paleolithic period saw several significant changes. Homo sapiens became more gracile and gradually lost significant brain volumes. Detailed realistic cave paintings disappeared completely, and iconic/symbolic ones appeared at other sites. This may indicate a shift in perceptual abilities, away from an accurate perception of the present. Language in modern humans interact with vision. One example is the McGurk effect. Studies show that artistic abilities may improve when language-related brain areas are damaged or temporarily knocked out. Language relies on many pre-existing non-linguistic functions. We suggest that an overwhelming flow of perceptual information, vision, in particular, was an obstacle to language, as is sometimes implied in autism with relative language impairment. We systematically review the recent research literature investigating the relationship between language and perception. We see homologues of language-relevant brain functions predating language. Recent findings show brain lateralization for communicative gestures in other primates without language, supporting the idea that a language-ready brain may be overwhelmed by raw perception, thus blocking overt language from evolving. We find support in converging evidence for a change in neural organization away from raw perception, thus pushing the emergence of language closer in time. A recent origin of language makes it possible to investigate the genetic origins of language.
Collapse
|
18
|
Bernstein LE, Jordan N, Auer ET, Eberhardt SP. Lipreading: A Review of Its Continuing Importance for Speech Recognition With an Acquired Hearing Loss and Possibilities for Effective Training. Am J Audiol 2022; 31:453-469. [PMID: 35316072 DOI: 10.1044/2021_aja-21-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Nicole Jordan
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Edward T. Auer
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| |
Collapse
|
19
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
20
|
Jessica Tan SH, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: The Relationship between Cortical Tracking of Continuous Auditory-Visual Speech and Gaze Behaviour in Infants, Children and Adults. Neuroimage 2022; 256:119217. [PMID: 35436614 DOI: 10.1016/j.neuroimage.2022.119217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants - as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants' relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults' attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Collapse
Affiliation(s)
- S H Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University.
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language; IKERBASQUE, Basque Foundation for Science
| | | | - Michael J Crosse
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University
| |
Collapse
|
21
|
Michon M, Zamorano-Abramson J, Aboitiz F. Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech. Front Psychol 2022; 13:829083. [PMID: 35432052 PMCID: PMC9007199 DOI: 10.3389/fpsyg.2022.829083] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 03/07/2022] [Indexed: 11/24/2022] Open
Abstract
While influential works since the 1970s have widely assumed that imitation is an innate skill in both human and non-human primate neonates, recent empirical studies and meta-analyses have challenged this view, indicating other forms of reward-based learning as relevant factors in the development of social behavior. The visual input translation into matching motor output that underlies imitation abilities instead seems to develop along with social interactions and sensorimotor experience during infancy and childhood. Recently, a new visual stream has been identified in both human and non-human primate brains, updating the dual visual stream model. This third pathway is thought to be specialized for dynamics aspects of social perceptions such as eye-gaze, facial expression and crucially for audio-visual integration of speech. Here, we review empirical studies addressing an understudied but crucial aspect of speech and communication, namely the processing of visual orofacial cues (i.e., the perception of a speaker's lips and tongue movements) and its integration with vocal auditory cues. Along this review, we offer new insights from our understanding of speech as the product of evolution and development of a rhythmic and multimodal organization of sensorimotor brain networks, supporting volitional motor control of the upper vocal tract and audio-visual voices-faces integration.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratory for Cognitive and Evolutionary Neuroscience, Department of Psychiatry, Faculty of Medicine, Interdisciplinary Center for Neuroscience, Pontificia Universidad Católica de Chile, Santiago, Chile
- Centro de Estudios en Neurociencia Humana y Neuropsicología, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - José Zamorano-Abramson
- Centro de Investigación en Complejidad Social, Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Francisco Aboitiz
- Laboratory for Cognitive and Evolutionary Neuroscience, Department of Psychiatry, Faculty of Medicine, Interdisciplinary Center for Neuroscience, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
22
|
Bernstein LE, Auer ET, Eberhardt SP. During Lipreading Training With Sentence Stimuli, Feedback Controls Learning and Generalization to Audiovisual Speech in Noise. Am J Audiol 2022; 31:57-77. [PMID: 34965362 DOI: 10.1044/2021_aja-21-00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE This study investigated the effects of external feedback on perceptual learning of visual speech during lipreading training with sentence stimuli. The goal was to improve visual-only (VO) speech recognition and increase accuracy of audiovisual (AV) speech recognition in noise. The rationale was that spoken word recognition depends on the accuracy of sublexical (phonemic/phonetic) speech perception; effective feedback during training must support sublexical perceptual learning. METHOD Normal-hearing (NH) adults were assigned to one of three types of feedback: Sentence feedback was the entire sentence printed after responding to the stimulus. Word feedback was the correct response words and perceptually near but incorrect response words. Consonant feedback was correct response words and consonants in incorrect but perceptually near response words. Six training sessions were given. Pre- and posttraining testing included an untrained control group. Test stimuli were disyllable nonsense words for forced-choice consonant identification, and isolated words and sentences for open-set identification. Words and sentences were VO, AV, and audio-only (AO) with the audio in speech-shaped noise. RESULTS Lipreading accuracy increased during training. Pre- and posttraining tests of consonant identification showed no improvement beyond test-retest increases obtained by untrained controls. Isolated word recognition with a talker not seen during training showed that the control group improved more than the sentence group. Tests of untrained sentences showed that the consonant group significantly improved in all of the stimulus conditions (VO, AO, and AV). Its mean words correct scores increased by 9.2 percentage points for VO, 3.4 percentage points for AO, and 9.8 percentage points for AV stimuli. CONCLUSIONS Consonant feedback during training with sentences stimuli significantly increased perceptual learning. The training generalized to untrained VO, AO, and AV sentence stimuli. Lipreading training has potential to significantly improve adults' face-to-face communication in noisy settings in which the talker can be seen.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| | - Edward T. Auer
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| |
Collapse
|
23
|
Peelle JE, Spehar B, Jones MS, McConkey S, Myerson J, Hale S, Sommers MS, Tye-Murray N. Increased Connectivity among Sensory and Motor Regions during Visual and Audiovisual Speech Perception. J Neurosci 2022; 42:435-442. [PMID: 34815317 PMCID: PMC8802926 DOI: 10.1523/jneurosci.0114-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 10/29/2021] [Accepted: 11/08/2021] [Indexed: 11/21/2022] Open
Abstract
In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Collapse
Affiliation(s)
- Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Brent Spehar
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Michael S Jones
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Sarah McConkey
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Joel Myerson
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| |
Collapse
|
24
|
Benetti S, Collignon O. Cross-modal integration and plasticity in the superior temporal cortex. HANDBOOK OF CLINICAL NEUROLOGY 2022; 187:127-143. [PMID: 35964967 DOI: 10.1016/b978-0-12-823493-8.00026-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In congenitally deaf people, temporal regions typically believed to be primarily auditory enhance their response to nonauditory information. The neural mechanisms and functional principles underlying this phenomenon, as well as its impact on auditory recovery after sensory restoration, yet remain debated. In this chapter, we demonstrate that the cross-modal recruitment of temporal regions by visual inputs in congenitally deaf people follows organizational principles known to be present in the hearing brain. We propose that the functional and structural mechanisms allowing optimal convergence of multisensory information in the temporal cortex of hearing people also provide the neural scaffolding for feeding visual or tactile information into the deafened temporal areas. Innate in their nature, such anatomo-functional links between the auditory and other sensory systems would represent the common substrate of both early multisensory integration and expression of selective cross-modal plasticity in the superior temporal cortex.
Collapse
Affiliation(s)
- Stefania Benetti
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Trento, Italy
| | - Olivier Collignon
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Trento, Italy; Institute for Research in Psychology and Neuroscience, Faculty of Psychology and Educational Science, UC Louvain, Louvain-la-Neuve, Belgium.
| |
Collapse
|
25
|
Rennig J, Beauchamp MS. Intelligibility of audiovisual sentences drives multivoxel response patterns in human superior temporal cortex. Neuroimage 2021; 247:118796. [PMID: 34906712 PMCID: PMC8819942 DOI: 10.1016/j.neuroimage.2021.118796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/18/2021] [Accepted: 12/08/2021] [Indexed: 11/18/2022] Open
Abstract
Regions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech, and neural responses in pSTG/S may underlie the perceptual benefit of visual speech for the comprehension of noisy auditory speech. We examined this possibility through the lens of multivoxel pattern responses in pSTG/S. BOLD fMRI data was collected from 22 participants presented with speech consisting of English sentences presented in five different formats: visual-only; auditory with and without added auditory noise; and audiovisual with and without auditory noise. Participants reported the intelligibility of each sentence with a button press and trials were sorted post-hoc into those that were more or less intelligible. Response patterns were measured in regions of the pSTG/S identified with an independent localizer. Noisy audiovisual sentences with very similar physical properties evoked very different response patterns depending on their intelligibility. When a noisy audiovisual sentence was reported as intelligible, the pattern was nearly identical to that elicited by clear audiovisual sentences. In contrast, an unintelligible noisy audiovisual sentence evoked a pattern like that of visual-only sentences. This effect was less pronounced for noisy auditory-only sentences, which evoked similar response patterns regardless of intelligibility. The successful integration of visual and auditory speech produces a characteristic neural signature in pSTG/S, highlighting the importance of this region in generating the perceptual benefit of visual speech.
Collapse
Affiliation(s)
- Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Building, A607, 3700 Hamilton Walk, Philadelphia, PA 19104-6016, United States.
| |
Collapse
|
26
|
Kaganovich N, Christ S. Event-related potentials evidence for long-term audiovisual representations of phonemes in adults. Eur J Neurosci 2021; 54:7860-7875. [PMID: 34750895 DOI: 10.1111/ejn.15519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 10/19/2022]
Abstract
The presence of long-term auditory representations for phonemes has been well-established. However, since speech perception is typically audiovisual, we hypothesized that long-term phoneme representations may also contain information on speakers' mouth shape during articulation. We used an audiovisual oddball paradigm in which, on each trial, participants saw a face and heard one of two vowels. One vowel occurred frequently (standard), while another occurred rarely (deviant). In one condition (neutral), the face had a closed, non-articulating mouth. In the other condition (audiovisual violation), the mouth shape matched the frequent vowel. Although in both conditions stimuli were audiovisual, we hypothesized that identical auditory changes would be perceived differently by participants. Namely, in the neutral condition, deviants violated only the audiovisual pattern specific to each block. By contrast, in the audiovisual violation condition, deviants additionally violated long-term representations for how a speaker's mouth looks during articulation. We compared the amplitude of mismatch negativity (MMN) and P3 components elicited by deviants in the two conditions. The MMN extended posteriorly over temporal and occipital sites even though deviants contained no visual changes, suggesting that deviants were perceived as interruptions in audiovisual, rather than auditory only, sequences. As predicted, deviants elicited larger MMN and P3 in the audiovisual violation compared to the neutral condition. The results suggest that long-term representations of phonemes are indeed audiovisual.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA.,Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Sharon Christ
- Department of Human Development and Family Studies, Purdue University, West Lafayette, Indiana, USA.,Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
27
|
Ito T, Ohashi H, Gracco VL. Somatosensory contribution to audio-visual speech processing. Cortex 2021; 143:195-204. [PMID: 34450567 DOI: 10.1016/j.cortex.2021.07.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 07/20/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Recent studies have demonstrated that the auditory speech perception of a listener can be modulated by somatosensory input applied to the facial skin suggesting that perception is an embodied process. However, speech perception is a multisensory process involving both the auditory and visual modalities. It is unknown whether and to what extent somatosensory stimulation to the facial skin modulates audio-visual speech perception. If speech perception is an embodied process, then somatosensory stimulation applied to the perceiver should influence audio-visual speech processing. Using the McGurk effect (the perceptual illusion that occurs when a sound is paired with the visual representation of a different sound, resulting in the perception of a third sound) we tested the prediction using a simple behavioral paradigm and at the neural level using event-related potentials (ERPs) and their cortical sources. We recorded ERPs from 64 scalp sites in response to congruent and incongruent audio-visual speech randomly presented with and without somatosensory stimulation associated with facial skin deformation. Subjects judged whether the production was /ba/ or not under all stimulus conditions. In the congruent audio-visual condition subjects identifying the sound as /ba/, but not in the incongruent condition consistent with the McGurk effect. Concurrent somatosensory stimulation improved the ability of participants to more correctly identify the production as /ba/ relative to the non-somatosensory condition in both congruent and incongruent conditions. ERP in response to the somatosensory stimulation for the incongruent condition reliably diverged 220 msec after stimulation onset. Cortical sources were estimated around the left anterior temporal gyrus, the right middle temporal gyrus, the right posterior superior temporal lobe and the right occipital region. The results demonstrate a clear multisensory convergence of somatosensory and audio-visual processing in both behavioral and neural processing consistent with the perspective that speech perception is a self-referenced, sensorimotor process.
Collapse
Affiliation(s)
- Takayuki Ito
- University Grenoble-Alpes, CNRS, Grenoble-INP, GIPSA-Lab, Saint Martin D'heres Cedex, France; Haskins Laboratories, New Haven, CT, USA.
| | | | - Vincent L Gracco
- Haskins Laboratories, New Haven, CT, USA; McGill University, Montréal, QC, Canada
| |
Collapse
|
28
|
Karthik G, Plass J, Beltz AM, Liu Z, Grabowecky M, Suzuki S, Stacey WC, Wasade VS, Towle VL, Tao JX, Wu S, Issa NP, Brang D. Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex. Eur J Neurosci 2021; 54:7301-7317. [PMID: 34587350 DOI: 10.1111/ejn.15482] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/20/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022]
Abstract
Speech perception is a central component of social communication. Although principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues. Visual speech modulates activity in cortical areas subserving auditory speech perception including the superior temporal gyrus (STG). However, it is unknown whether visual modulation of auditory processing is a unitary phenomenon or, rather, consists of multiple functionally distinct processes. To explore this question, we examined neural responses to audiovisual speech measured from intracranially implanted electrodes in 21 patients with epilepsy. We found that visual speech modulated auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differed across frequency bands. In the theta band, visual speech suppressed the auditory response from before auditory speech onset to after auditory speech onset (-93 to 500 ms) most strongly in the posterior STG. In the beta band, suppression was seen in the anterior STG from -311 to -195 ms before auditory speech onset and in the middle STG from -195 to 235 ms after speech onset. In high gamma, visual speech enhanced the auditory response from -45 to 24 ms only in the posterior STG. We interpret the visual-induced changes prior to speech onset as reflecting crossmodal prediction of speech signals. In contrast, modulations after sound onset may reflect a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.
Collapse
Affiliation(s)
- G Karthik
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - John Plass
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Adriene M Beltz
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Zhongming Liu
- Department of Biomedical Engineering and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - William C Stacey
- Department of Neurology and Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Vibhangini S Wasade
- Department of Neurology, Henry Ford Hospital, Detroit, Michigan, USA.,Department of Neurology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Vernon L Towle
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - James X Tao
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Shasha Wu
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Naoum P Issa
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
29
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
30
|
Johnston A, Brown BB, Elson R. Synchronous facial action binds dynamic facial features. Sci Rep 2021; 11:7191. [PMID: 33785856 PMCID: PMC8010062 DOI: 10.1038/s41598-021-86725-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 02/22/2021] [Indexed: 11/09/2022] Open
Abstract
We asked how dynamic facial features are perceptually grouped. To address this question, we varied the timing of mouth movements relative to eyebrow movements, while measuring the detectability of a small temporal misalignment between a pair of oscillating eyebrows-an eyebrow wave. We found eyebrow wave detection performance was worse for synchronous movements of the eyebrows and mouth. Subsequently, we found this effect was specific to stimuli presented to the right visual field, implicating the involvement of left lateralised visual speech areas. Adaptation has been used as a tool in low-level vision to establish the presence of separable visual channels. Adaptation to moving eyebrows and mouths with various relative timings reduced eyebrow wave detection but only when the adapting mouth and eyebrows moved asynchronously. Inverting the face led to a greater reduction in detection after adaptation particularly for asynchronous facial motion at test. We conclude that synchronous motion binds dynamic facial features whereas asynchronous motion releases them, allowing adaptation to impair eyebrow wave detection.
Collapse
Affiliation(s)
- Alan Johnston
- School of Psychology, University Park, The University of Nottingham, Nottingham, NG7 2RD, UK.
| | - Ben B Brown
- School of Psychology, University Park, The University of Nottingham, Nottingham, NG7 2RD, UK
| | - Ryan Elson
- School of Psychology, University Park, The University of Nottingham, Nottingham, NG7 2RD, UK
| |
Collapse
|
31
|
Vannuscorps G, Andres M, Carneiro SP, Rombaux E, Caramazza A. Typically Efficient Lipreading without Motor Simulation. J Cogn Neurosci 2021; 33:611-621. [PMID: 33416443 DOI: 10.1162/jocn_a_01666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
All it takes is a face-to-face conversation in a noisy environment to realize that viewing a speaker's lip movements contributes to speech comprehension. What are the processes underlying the perception and interpretation of visual speech? Brain areas that control speech production are also recruited during lipreading. This finding raises the possibility that lipreading may be supported, at least to some extent, by a covert unconscious imitation of the observed speech movements in the observer's own speech motor system-a motor simulation. However, whether, and if so to what extent, motor simulation contributes to visual speech interpretation remains unclear. In two experiments, we found that several participants with congenital facial paralysis were as good at lipreading as the control population and performed these tasks in a way that is qualitatively similar to the controls despite severely reduced or even completely absent lip motor representations. Although it remains an open question whether this conclusion generalizes to other experimental conditions and to typically developed participants, these findings considerably narrow the space of hypothesis for a role of motor simulation in lipreading. Beyond its theoretical significance in the field of speech perception, this finding also calls for a re-examination of the more general hypothesis that motor simulation underlies action perception and interpretation developed in the frameworks of motor simulation and mirror neuron hypotheses.
Collapse
|
32
|
The interrelationship between the face and vocal tract configuration during audiovisual speech. Proc Natl Acad Sci U S A 2020; 117:32791-32798. [PMID: 33293422 PMCID: PMC7768679 DOI: 10.1073/pnas.2006192117] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Speech perception is improved when we are able to see the person who is speaking, but how visual speech cues are used to improve speech perception is currently unclear. Brain imaging has revealed that regions responsible for motor control are active during the perception of speech, opening up the possibility that visual cues are mapped onto an internal representation of the vocal tract. Here, we show that there is sufficient information in the configuration of the face to recover the vocal tract configuration and that the key areas responsible for driving the correspondence vary in accordance with the articulation required to form the acoustic signal at the appropriate point in a sentence. It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A “Bubbles” technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa, on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal.
Collapse
|
33
|
Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect. Atten Percept Psychophys 2020; 82:3544-3557. [PMID: 32533526 PMCID: PMC7788022 DOI: 10.3758/s13414-020-02042-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall (Brain Research, 1242, 162–171, 2008) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall (Journal of Speech, Language, and Hearing Research, 59(4), 601–615, 2016) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.
Collapse
|
34
|
Thézé R, Giraud AL, Mégevand P. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech. SCIENCE ADVANCES 2020; 6:6/45/eabc6348. [PMID: 33148648 PMCID: PMC7673697 DOI: 10.1126/sciadv.abc6348] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/17/2020] [Indexed: 06/11/2023]
Abstract
When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland.
- Division of Neurology, Department of Clinical Neurosciences, Geneva University Hospitals, 1205 Geneva, Switzerland
| |
Collapse
|
35
|
Michon M, Boncompte G, López V. Electrophysiological Dynamics of Visual Speech Processing and the Role of Orofacial Effectors for Cross-Modal Predictions. Front Hum Neurosci 2020; 14:538619. [PMID: 33192386 PMCID: PMC7653187 DOI: 10.3389/fnhum.2020.538619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audio-visual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Social, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - Gonzalo Boncompte
- Laboratorio de Neurodinámicas de la Cognición, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Vladimir López
- Laboratorio de Psicología Experimental, Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
36
|
Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020; 10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open
Abstract
In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.
Collapse
|
37
|
Reduced resting state functional connectivity with increasing age-related hearing loss and McGurk susceptibility. Sci Rep 2020; 10:16987. [PMID: 33046800 PMCID: PMC7550565 DOI: 10.1038/s41598-020-74012-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 09/15/2020] [Indexed: 11/21/2022] Open
Abstract
Age-related hearing loss has been related to a compensatory increase in audio-visual integration and neural reorganization including alterations in functional resting state connectivity. How these two changes are linked in elderly listeners is unclear. The current study explored modulatory effects of hearing thresholds and audio-visual integration on resting state functional connectivity. We analysed a large set of resting state data of 65 elderly participants with a widely varying degree of untreated hearing loss. Audio-visual integration, as gauged with the McGurk effect, increased with progressing hearing thresholds. On the neural level, McGurk illusions were negatively related to functional coupling between motor and auditory regions. Similarly, connectivity of the dorsal attention network to sensorimotor and primary motor cortices was reduced with increasing hearing loss. The same effect was obtained for connectivity between the salience network and visual cortex. Our findings suggest that with progressing untreated age-related hearing loss, functional coupling at rest declines, affecting connectivity of brain networks and areas associated with attentional, visual, sensorimotor and motor processes. Especially connectivity reductions between auditory and motor areas were related to stronger audio-visual integration found with increasing hearing loss.
Collapse
|
38
|
Dash D, Wisler A, Ferrari P, Davenport EM, Maldjian J, Wang J. MEG Sensor Selection for Neural Speech Decoding. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:182320-182337. [PMID: 33204579 PMCID: PMC7668411 DOI: 10.1109/access.2020.3028831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Direct decoding of speech from the brain is a faster alternative to current electroencephalography (EEG) speller-based brain-computer interfaces (BCI) in providing communication assistance to locked-in patients. Magnetoencephalography (MEG) has recently shown great potential as a non-invasive neuroimaging modality for neural speech decoding, owing in part to its spatial selectivity over other high-temporal resolution devices. Standard MEG systems have a large number of cryogenically cooled channels/sensors (200 - 300) encapsulated within a fixed liquid helium dewar, precluding their use as wearable BCI devices. Fortunately, recently developed optically pumped magnetometers (OPM) do not require cryogens, and have the potential to be wearable and movable making them more suitable for BCI applications. This design is also modular allowing for customized montages to include only the sensors necessary for a particular task. As the number of sensors bears a heavy influence on the cost, size, and weight of MEG systems, minimizing the number of sensors is critical for designing practical MEG-based BCIs in the future. In this study, we sought to identify an optimal set of MEG channels to decode imagined and spoken phrases from the MEG signals. Using a forward selection algorithm with a support vector machine classifier we found that nine optimally located MEG gradiometers provided higher decoding accuracy compared to using all channels. Additionally, the forward selection algorithm achieved similar performance to dimensionality reduction using a stacked-sparse-autoencoder. Analysis of spatial dynamics of speech decoding suggested that both left and right hemisphere sensors contribute to speech decoding. Sensors approximately located near Broca's area were found to be commonly contributing among the higher-ranked sensors across all subjects.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Alan Wisler
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Paul Ferrari
- MEG Laboratory, Dell Children's Medical Center, Austin, TX 78723, USA
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712, USA
| | | | - Joseph Maldjian
- Department of Radiology, University of Texas at Southwestern, Dallas, TX 75390, USA
| | - Jun Wang
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
39
|
Abstract
Visual information from the face of an interlocutor complements auditory information from their voice, enhancing intelligibility. However, there are large individual differences in the ability to comprehend noisy audiovisual speech. Another axis of individual variability is the extent to which humans fixate the mouth or the eyes of a viewed face. We speculated that across a lifetime of face viewing, individuals who prefer to fixate the mouth of a viewed face might accumulate stronger associations between visual and auditory speech, resulting in improved comprehension of noisy audiovisual speech. To test this idea, we assessed interindividual variability in two tasks. Participants (n = 102) varied greatly in their ability to understand noisy audiovisual sentences (accuracy from 2-58%) and in the time they spent fixating the mouth of a talker enunciating clear audiovisual syllables (3-98% of total time). These two variables were positively correlated: a 10% increase in time spent fixating the mouth equated to a 5.6% increase in multisensory gain. This finding demonstrates an unexpected link, mediated by histories of visual exposure, between two fundamental human abilities: processing faces and understanding speech.
Collapse
|
40
|
Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020; 9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld.
Collapse
Affiliation(s)
- Anne Keitel
- Psychology, University of DundeeDundeeUnited Kingdom
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of MünsterMünsterGermany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| |
Collapse
|
41
|
Destoky F, Bertels J, Niesen M, Wens V, Vander Ghinst M, Leybaert J, Lallier M, Ince RAA, Gross J, De Tiège X, Bourguignon M. Cortical tracking of speech in noise accounts for reading strategies in children. PLoS Biol 2020; 18:e3000840. [PMID: 32845876 PMCID: PMC7478533 DOI: 10.1371/journal.pbio.3000840] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 09/08/2020] [Accepted: 08/12/2020] [Indexed: 11/29/2022] Open
Abstract
Humans' propensity to acquire literacy relates to several factors, including the ability to understand speech in noise (SiN). Still, the nature of the relation between reading and SiN perception abilities remains poorly understood. Here, we dissect the interplay between (1) reading abilities, (2) classical behavioral predictors of reading (phonological awareness, phonological memory, and rapid automatized naming), and (3) electrophysiological markers of SiN perception in 99 elementary school children (26 with dyslexia). We demonstrate that, in typical readers, cortical representation of the phrasal content of SiN relates to the degree of development of the lexical (but not sublexical) reading strategy. In contrast, classical behavioral predictors of reading abilities and the ability to benefit from visual speech to represent the syllabic content of SiN account for global reading performance (i.e., speed and accuracy of lexical and sublexical reading). In individuals with dyslexia, we found preserved integration of visual speech information to optimize processing of syntactic information but not to sustain acoustic/phonemic processing. Finally, within children with dyslexia, measures of cortical representation of the phrasal content of SiN were negatively related to reading speed and positively related to the compromise between reading precision and reading speed, potentially owing to compensatory attentional mechanisms. These results clarify the nature of the relation between SiN perception and reading abilities in typical child readers and children with dyslexia and identify novel electrophysiological markers of emergent literacy.
Collapse
Affiliation(s)
- Florian Destoky
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Julie Bertels
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Consciousness, Cognition and Computation group, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Maxime Niesen
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Service d'ORL et de chirurgie cervico-faciale, ULB-Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Vincent Wens
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Marc Vander Ghinst
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Jacqueline Leybaert
- Laboratoire Cognition Langage et Développement, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Marie Lallier
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain
| | - Robin A. A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
- Institute for Biomagnetism and Biosignal analysis, University of Muenster, Muenster, Germany
| | - Xavier De Tiège
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Mathieu Bourguignon
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Laboratoire Cognition Langage et Développement, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain
| |
Collapse
|
42
|
Ullas S, Hausfeld L, Cutler A, Eisner F, Formisano E. Neural Correlates of Phonetic Adaptation as Induced by Lexical and Audiovisual Context. J Cogn Neurosci 2020; 32:2145-2158. [PMID: 32662723 DOI: 10.1162/jocn_a_01608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
When speech perception is difficult, one way listeners adjust is by reconfiguring phoneme category boundaries, drawing on contextual information. Both lexical knowledge and lipreading cues are used in this way, but it remains unknown whether these two differing forms of perceptual learning are similar at a neural level. This study compared phoneme boundary adjustments driven by lexical or audiovisual cues, using ultra-high-field 7-T fMRI. During imaging, participants heard exposure stimuli and test stimuli. Exposure stimuli for lexical retuning were audio recordings of words, and those for audiovisual recalibration were audio-video recordings of lip movements during utterances of pseudowords. Test stimuli were ambiguous phonetic strings presented without context, and listeners reported what phoneme they heard. Reports reflected phoneme biases in preceding exposure blocks (e.g., more reported /p/ after /p/-biased exposure). Analysis of corresponding brain responses indicated that both forms of cue use were associated with a network of activity across the temporal cortex, plus parietal, insula, and motor areas. Audiovisual recalibration also elicited significant occipital cortex activity despite the lack of visual stimuli. Activity levels in several ROIs also covaried with strength of audiovisual recalibration, with greater activity accompanying larger recalibration shifts. Similar activation patterns appeared for lexical retuning, but here, no significant ROIs were identified. Audiovisual and lexical forms of perceptual learning thus induce largely similar brain response patterns. However, audiovisual recalibration involves additional visual cortex contributions, suggesting that previously acquired visual information (on lip movements) is retrieved and deployed to disambiguate auditory perception.
Collapse
Affiliation(s)
- Shruti Ullas
- Maastricht University.,Maastricht Brain Imaging Centre
| | - Lars Hausfeld
- Maastricht University.,Maastricht Brain Imaging Centre
| | | | | | - Elia Formisano
- Maastricht University.,Maastricht Brain Imaging Centre.,Maastricht Centre for Systems Biology
| |
Collapse
|
43
|
Maffei V, Indovina I, Mazzarella E, Giusti MA, Macaluso E, Lacquaniti F, Viviani P. Sensitivity of occipito-temporal cortex, premotor and Broca's areas to visible speech gestures in a familiar language. PLoS One 2020; 15:e0234695. [PMID: 32559213 PMCID: PMC7304574 DOI: 10.1371/journal.pone.0234695] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/01/2020] [Indexed: 11/18/2022] Open
Abstract
When looking at a speaking person, the analysis of facial kinematics contributes to language discrimination and to the decoding of the time flow of visual speech. To disentangle these two factors, we investigated behavioural and fMRI responses to familiar and unfamiliar languages when observing speech gestures with natural or reversed kinematics. Twenty Italian volunteers viewed silent video-clips of speech shown as recorded (Forward, biological motion) or reversed in time (Backward, non-biological motion), in Italian (familiar language) or Arabic (non-familiar language). fMRI revealed that language (Italian/Arabic) and time-rendering (Forward/Backward) modulated distinct areas in the ventral occipito-temporal cortex, suggesting that visual speech analysis begins in this region, earlier than previously thought. Left premotor ventral (superior subdivision) and dorsal areas were preferentially activated with the familiar language independently of time-rendering, challenging the view that the role of these regions in speech processing is purely articulatory. The left premotor ventral region in the frontal operculum, thought to include part of the Broca's area, responded to the natural familiar language, consistent with the hypothesis of motor simulation of speech gestures.
Collapse
Affiliation(s)
- Vincenzo Maffei
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy.,Data Lake & BI, DOT - Technology, Poste Italiane, Rome, Italy
| | - Iole Indovina
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Departmental Faculty of Medicine and Surgery, Saint Camillus International University of Health and Medical Sciences, Rome, Italy
| | | | - Maria Assunta Giusti
- Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Emiliano Macaluso
- ImpAct Team, Lyon Neuroscience Research Center, Lyon, France.,Laboratory of Neuroimaging, IRCCS Santa Lucia Foundation, Rome, Italy
| | - Francesco Lacquaniti
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Paolo Viviani
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
44
|
Proverbio AM, Camporeale E, Brusa A. Multimodal Recognition of Emotions in Music and Facial Expressions. Front Hum Neurosci 2020; 14:32. [PMID: 32116613 PMCID: PMC7027335 DOI: 10.3389/fnhum.2020.00032] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 01/23/2020] [Indexed: 01/24/2023] Open
Abstract
The aim of the study was to investigate the neural processing of congruent vs. incongruent affective audiovisual information (facial expressions and music) by means of ERPs (Event Related Potentials) recordings. Stimuli were 200 infant faces displaying Happiness, Relaxation, Sadness, Distress and 32 piano musical pieces conveying the same emotional states (as specifically assessed). Music and faces were presented simultaneously, and paired so that in half cases they were emotionally congruent or incongruent. Twenty subjects were told to pay attention and respond to infrequent targets (adult neutral faces) while their EEG was recorded from 128 channels. The face-related N170 (160-180 ms) component was the earliest response affected by the emotional content of faces (particularly by distress), while visual P300 (250-450 ms) and auditory N400 (350-550 ms) responses were specifically modulated by the emotional content of both facial expressions and musical pieces. Face/music emotional incongruence elicited a wide N400 negativity indicating the detection of a mismatch in the expressed emotion. A swLORETA inverse solution applied to N400 (difference wave Incong. - Cong.), showed the crucial role of Inferior and Superior Temporal Gyri in the multimodal representation of emotional information extracted from faces and music. Furthermore, the prefrontal cortex (superior and medial, BA 10) was also strongly active, possibly supporting working memory. The data hints at a common system for representing emotional information derived by social cognition and music processing, including uncus and cuneus.
Collapse
|
45
|
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech. J Neurosci 2019; 40:1053-1065. [PMID: 31889007 DOI: 10.1523/jneurosci.1101-19.2019] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/28/2019] [Accepted: 12/04/2019] [Indexed: 11/21/2022] Open
Abstract
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Collapse
|
46
|
Visual inputs decrease brain activity in frontal areas during silent lipreading. PLoS One 2019; 14:e0223782. [PMID: 31600311 PMCID: PMC6786756 DOI: 10.1371/journal.pone.0223782] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 09/27/2019] [Indexed: 11/19/2022] Open
Abstract
Aim The aim of the present work is to analyze the modulation of the brain activity within the areas involved in lipreading when an additional visual stimulus is included. Methods The experiment consisted of two fMRI runs (lipreading_only and lipreading+picture) where two conditions were considered in each one (oral speech sentences condition [OSS] and oral speech syllables condition [OSSY]). Results During lipreading-only, higher activity in the left middle temporal gyrus (MTG) was identified for OSS than OSSY; during lipreading+picture, apart from the left MTG, higher activity was also present in the supplementary motor area (SMA), the left precentral gyrus (PreCG) and the left inferior frontal gyrus (IFG). The comparison between these two runs revealed higher activity for lipreading-only in the SMA and the left IFG. Conclusion The presence of a visual reference during a lipreading task leads to a decrease in activity in frontal areas.
Collapse
|
47
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
48
|
Kolozsvári OB, Xu W, Leppänen PHT, Hämäläinen JA. Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level. Front Hum Neurosci 2019; 13:243. [PMID: 31354459 PMCID: PMC6639789 DOI: 10.3389/fnhum.2019.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 06/28/2019] [Indexed: 11/13/2022] Open
Abstract
During speech perception, listeners rely on multimodal input and make use of both auditory and visual information. When presented with speech, for example syllables, the differences in brain responses to distinct stimuli are not, however, caused merely by the acoustic or visual features of the stimuli. The congruency of the auditory and visual information and the familiarity of a syllable, that is, whether it appears in the listener's native language or not, also modulates brain responses. We investigated how the congruency and familiarity of the presented stimuli affect brain responses to audio-visual (AV) speech in 12 adult Finnish native speakers and 12 adult Chinese native speakers. They watched videos of a Chinese speaker pronouncing syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a magnetoencephalography (MEG) measurement where only /pa/ and /ta/ were part of Finnish phonology while all the stimuli were part of Chinese phonology. The stimuli were presented in audio-visual (congruent or incongruent), audio only, or visual only conditions. The brain responses were examined in five time-windows: 75-125, 150-200, 200-300, 300-400, and 400-600 ms. We found significant differences for the congruency comparison in the fourth time-window (300-400 ms) in both sensor and source level analysis. Larger responses were observed for the incongruent stimuli than for the congruent stimuli. For the familiarity comparisons no significant differences were found. The results are in line with earlier studies reporting on the modulation of brain responses for audio-visual congruency around 250-500 ms. This suggests a much stronger process for the general detection of a mismatch between predictions based on lip movements and the auditory signal than for the top-down modulation of brain responses based on phonological information.
Collapse
Affiliation(s)
- Orsolya B Kolozsvári
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Weiyong Xu
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Paavo H T Leppänen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Jarmo A Hämäläinen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
49
|
Bidelman GM, Sigley L, Lewis GA. Acoustic noise and vision differentially warp the auditory categorization of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:60. [PMID: 31370660 PMCID: PMC6786888 DOI: 10.1121/1.5114822] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/05/2019] [Accepted: 06/07/2019] [Indexed: 06/10/2023]
Abstract
Speech perception requires grouping acoustic information into meaningful linguistic-phonetic units via categorical perception (CP). Beyond shrinking observers' perceptual space, CP might aid degraded speech perception if categories are more resistant to noise than surface acoustic features. Combining audiovisual (AV) cues also enhances speech recognition, particularly in noisy environments. This study investigated the degree to which visual cues from a talker (i.e., mouth movements) aid speech categorization amidst noise interference by measuring participants' identification of clear and noisy speech (0 dB signal-to-noise ratio) presented in auditory-only or combined AV modalities (i.e., A, A+noise, AV, AV+noise conditions). Auditory noise expectedly weakened (i.e., shallower identification slopes) and slowed speech categorization. Interestingly, additional viseme cues largely counteracted noise-related decrements in performance and stabilized classification speeds in both clear and noise conditions suggesting more precise acoustic-phonetic representations with multisensory information. Results are parsimoniously described under a signal detection theory framework and by a reduction (visual cues) and increase (noise) in the precision of perceptual object representation, which were not due to lapses of attention or guessing. Collectively, findings show that (i) mapping sounds to categories aids speech perception in "cocktail party" environments; (ii) visual cues help lattice formation of auditory-phonetic categories to enhance and refine speech identification.
Collapse
Affiliation(s)
- Gavin M Bidelman
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| | - Lauren Sigley
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| | - Gwyneth A Lewis
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| |
Collapse
|
50
|
Imafuku M, Kanakogi Y, Butler D, Myowa M. Demystifying infant vocal imitation: The roles of mouth looking and speaker's gaze. Dev Sci 2019; 22:e12825. [PMID: 30980494 DOI: 10.1111/desc.12825] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 01/08/2019] [Accepted: 03/01/2019] [Indexed: 12/20/2022]
Abstract
Vocal imitation plays a fundamental role in human language acquisition from infancy. Little is known, however, about how infants imitate other's sounds. We focused on three factors: (a) whether infants receive information from upright faces, (b) the infant's observation of the speaker's mouth and (c) the speaker directing their gaze towards the infant. We recorded the eye movements of 6-month-olds who participated in experiments watching videos of a speaker producing vowel sounds. We found that an infants' tendency to vocally imitate such videos increased as a function of (a) seeing upright rather than inverted faces, (b) their increased looking towards the speaker's mouth and (c) whether the speaker directed their gaze towards, rather than away from infants. These latter findings are consistent with theories of motor resonance and natural pedagogy respectively. New light has been shed on the cues and underlying mechanisms linking infant speech perception and production.
Collapse
Affiliation(s)
- Masahiro Imafuku
- Graduate School of Education, Kyoto University, Kyoto, Japan.,Faculty of Education, Musashino University, Tokyo, Japan
| | | | - David Butler
- Graduate School of Education, Kyoto University, Kyoto, Japan.,The Institute for Social Neuroscience Psychology, Heidelberg, Victoria, Australia
| | - Masako Myowa
- Graduate School of Education, Kyoto University, Kyoto, Japan
| |
Collapse
|