1
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
2
|
Teramoto W, Ernst MO. Effects of invisible lip movements on phonetic perception. Sci Rep 2023; 13:6478. [PMID: 37081084 PMCID: PMC10119180 DOI: 10.1038/s41598-023-33791-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 04/19/2023] [Indexed: 04/22/2023] Open
Abstract
We investigated whether 'invisible' visual information, i.e., visual information that is not consciously perceived, could affect auditory speech perception. Repeated exposure to McGurk stimuli (auditory /ba/ with visual [ga]) temporarily changes the perception of the auditory /ba/ into a 'da' or 'ga'. This altered auditory percept persists even after the presentation of the McGurk stimuli when the auditory stimulus is presented alone (McGurk aftereffect). We used this and presented the auditory /ba/ either with or without (No Face) a masked face articulating a visual [ba] (Congruent Invisible) or a visual [ga] (Incongruent Invisible). Thus, we measured the extent to which the invisible faces could undo or prolong the McGurk aftereffects. In a further control condition, the incongruent faces remained unmasked and thus visible, resulting in four conditions in total. Visibility was defined by the participants' subjective dichotomous reports ('visible' or 'invisible'). The results showed that the Congruent Invisible condition reduced the McGurk aftereffects compared with the other conditions, while the Incongruent Invisible condition showed no difference with the No Face condition. These results suggest that 'invisible' visual information that is not consciously perceived can affect phonetic perception, but only when visual information is congruent with auditory information.
Collapse
Affiliation(s)
- W Teramoto
- Faculty of Humanities and Cultural Sciences (Psychology), Kumamoto University, 2-40-1 Kurokami, Kumamoto, 860-8555, Japan.
| | - M O Ernst
- Applied Cognitive Psychology, Ulm University, Albert-Einstein-Allee 43, 89081, Ulm, Germany
| |
Collapse
|
3
|
Iqbal ZJ, Shahin AJ, Bortfeld H, Backer KC. The McGurk Illusion: A Default Mechanism of the Auditory System. Brain Sci 2023; 13:brainsci13030510. [PMID: 36979322 PMCID: PMC10046462 DOI: 10.3390/brainsci13030510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/10/2023] [Accepted: 03/18/2023] [Indexed: 03/22/2023] Open
Abstract
Recent studies have questioned past conclusions regarding the mechanisms of the McGurk illusion, especially how McGurk susceptibility might inform our understanding of audiovisual (AV) integration. We previously proposed that the McGurk illusion is likely attributable to a default mechanism, whereby either the visual system, auditory system, or both default to specific phonemes—those implicated in the McGurk illusion. We hypothesized that the default mechanism occurs because visual stimuli with an indiscernible place of articulation (like those traditionally used in the McGurk illusion) lead to an ambiguous perceptual environment and thus a failure in AV integration. In the current study, we tested the default hypothesis as it pertains to the auditory system. Participants performed two tasks. One task was a typical McGurk illusion task, in which individuals listened to auditory-/ba/ paired with visual-/ga/ and judged what they heard. The second task was an auditory-only task, in which individuals transcribed trisyllabic words with a phoneme replaced by silence. We found that individuals’ transcription of missing phonemes often defaulted to ‘/d/t/th/’, the same phonemes often experienced during the McGurk illusion. Importantly, individuals’ default rate was positively correlated with their McGurk rate. We conclude that the McGurk illusion arises when people fail to integrate visual percepts with auditory percepts, due to visual ambiguity, thus leading the auditory system to default to phonemes often implicated in the McGurk illusion.
Collapse
Affiliation(s)
- Zunaira J. Iqbal
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
| | - Heather Bortfeld
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Department of Psychological Sciences, University of California, Merced, CA 95353, USA
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Correspondence:
| |
Collapse
|
4
|
Scheliga S, Kellermann T, Lampert A, Rolke R, Spehr M, Habel U. Neural correlates of multisensory integration in the human brain: an ALE meta-analysis. Rev Neurosci 2023; 34:223-245. [PMID: 36084305 DOI: 10.1515/revneuro-2022-0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 07/22/2022] [Indexed: 02/07/2023]
Abstract
Previous fMRI research identified superior temporal sulcus as central integration area for audiovisual stimuli. However, less is known about a general multisensory integration network across senses. Therefore, we conducted activation likelihood estimation meta-analysis with multiple sensory modalities to identify a common brain network. We included 49 studies covering all Aristotelian senses i.e., auditory, visual, tactile, gustatory, and olfactory stimuli. Analysis revealed significant activation in bilateral superior temporal gyrus, middle temporal gyrus, thalamus, right insula, and left inferior frontal gyrus. We assume these regions to be part of a general multisensory integration network comprising different functional roles. Here, thalamus operate as first subcortical relay projecting sensory information to higher cortical integration centers in superior temporal gyrus/sulcus while conflict-processing brain regions as insula and inferior frontal gyrus facilitate integration of incongruent information. We additionally performed meta-analytic connectivity modelling and found each brain region showed co-activations within the identified multisensory integration network. Therefore, by including multiple sensory modalities in our meta-analysis the results may provide evidence for a common brain network that supports different functional roles for multisensory integration.
Collapse
Affiliation(s)
- Sebastian Scheliga
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Thilo Kellermann
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Angelika Lampert
- Institute of Physiology, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Roman Rolke
- Department of Palliative Medicine, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Marc Spehr
- Department of Chemosensation, RWTH Aachen University, Institute for Biology, Worringerweg 3, 52074 Aachen, Germany
| | - Ute Habel
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| |
Collapse
|
5
|
Butera IM, Stevenson RA, Gifford RH, Wallace MT. Visually biased Perception in Cochlear Implant Users: A Study of the McGurk and Sound-Induced Flash Illusions. Trends Hear 2023; 27:23312165221076681. [PMID: 37377212 PMCID: PMC10334005 DOI: 10.1177/23312165221076681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 12/08/2021] [Accepted: 01/10/2021] [Indexed: 06/29/2023] Open
Abstract
The reduction in spectral resolution by cochlear implants oftentimes requires complementary visual speech cues to facilitate understanding. Despite substantial clinical characterization of auditory-only speech measures, relatively little is known about the audiovisual (AV) integrative abilities that most cochlear implant (CI) users rely on for daily speech comprehension. In this study, we tested AV integration in 63 CI users and 69 normal-hearing (NH) controls using the McGurk and sound-induced flash illusions. To our knowledge, this study is the largest to-date measuring the McGurk effect in this population and the first that tests the sound-induced flash illusion (SIFI). When presented with conflicting AV speech stimuli (i.e., the phoneme "ba" dubbed onto the viseme "ga"), we found that 55 CI users (87%) reported a fused percept of "da" or "tha" on at least one trial. After applying an error correction based on unisensory responses, we found that among those susceptible to the illusion, CI users experienced lower fusion than controls-a result that was concordant with results from the SIFI where the pairing of a single circle flashing on the screen with multiple beeps resulted in fewer illusory flashes for CI users. While illusion perception in these two tasks appears to be uncorrelated among CI users, we identified a negative correlation in the NH group. Because neither illusion appears to provide further explanation of variability in CI outcome measures, further research is needed to determine how these findings relate to CI users' speech understanding, particularly in ecological listening conditions that are naturally multisensory.
Collapse
Affiliation(s)
- Iliza M. Butera
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
| | - Ryan A. Stevenson
- Department of Psychology, University of
Western Ontario, London, ON, Canada
- Brain and Mind Institute, University of
Western Ontario, London, ON, Canada
| | - René H. Gifford
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
| | - Mark T. Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt
University Medical Center, Nashville, TN, USA
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
6
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
7
|
Possible Neural Mechanisms Underlying Sensory Over-Responsivity in Individuals with ASD. CURRENT DEVELOPMENTAL DISORDERS REPORTS 2022. [DOI: 10.1007/s40474-022-00257-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Erdener D, Evren Erdener Ş. Speechreading as a secondary diagnostic tool in bipolar disorder. Med Hypotheses 2022. [DOI: 10.1016/j.mehy.2021.110744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Butera IM, Larson ED, DeFreese AJ, Lee AKC, Gifford RH, Wallace MT. Functional localization of audiovisual speech using near infrared spectroscopy. Brain Topogr 2022; 35:416-430. [PMID: 35821542 PMCID: PMC9334437 DOI: 10.1007/s10548-022-00904-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 05/19/2022] [Indexed: 11/21/2022]
Abstract
Visual cues are especially vital for hearing impaired individuals such as cochlear implant (CI) users to understand speech in noise. Functional Near Infrared Spectroscopy (fNIRS) is a light-based imaging technology that is ideally suited for measuring the brain activity of CI users due to its compatibility with both the ferromagnetic and electrical components of these implants. In a preliminary step toward better elucidating the behavioral and neural correlates of audiovisual (AV) speech integration in CI users, we designed a speech-in-noise task and measured the extent to which 24 normal hearing individuals could integrate the audio of spoken monosyllabic words with the corresponding visual signals of a female speaker. In our behavioral task, we found that audiovisual pairings provided average improvements of 103% and 197% over auditory-alone listening conditions in -6 and -9 dB signal-to-noise ratios consisting of multi-talker background noise. In an fNIRS task using similar stimuli, we measured activity during auditory-only listening, visual-only lipreading, and AV listening conditions. We identified cortical activity in all three conditions over regions of middle and superior temporal cortex typically associated with speech processing and audiovisual integration. In addition, three channels active during the lipreading condition showed uncorrected correlations associated with behavioral measures of audiovisual gain as well as with the McGurk effect. Further work focusing primarily on the regions of interest identified in this study could test how AV speech integration may differ for CI users who rely on this mechanism for daily communication.
Collapse
Affiliation(s)
- Iliza M. Butera
- grid.152326.10000 0001 2264 7217Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN USA
| | - Eric D. Larson
- grid.34477.330000000122986657Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA
| | - Andrea J. DeFreese
- grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA
| | - Adrian KC Lee
- grid.34477.330000000122986657Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA ,grid.34477.330000000122986657Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington USA
| | - René H. Gifford
- grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA
| | - Mark T. Wallace
- grid.152326.10000 0001 2264 7217Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN USA ,grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA ,grid.412807.80000 0004 1936 9916Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN USA
| |
Collapse
|
10
|
Ekert JO, Gajardo-Vidal A, Lorca-Puls DL, Hope TMH, Dick F, Crinion JT, Green DW, Price CJ. Dissociating the functions of three left posterior superior temporal regions that contribute to speech perception and production. Neuroimage 2021; 245:118764. [PMID: 34848301 PMCID: PMC9125162 DOI: 10.1016/j.neuroimage.2021.118764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 11/15/2021] [Accepted: 11/24/2021] [Indexed: 11/28/2022] Open
Abstract
Prior studies have shown that the left posterior superior temporal sulcus (pSTS) and left temporo-parietal junction (TPJ) both contribute to phonological short-term memory, speech perception and speech production. Here, by conducting a within-subjects multi-factorial fMRI study, we dissociate the response profiles of these regions and a third region – the anterior ascending terminal branch of the left superior temporal sulcus (atSTS), which lies dorsal to pSTS and ventral to TPJ. First, we show that each region was more activated by (i) 1-back matching on visually presented verbal stimuli (words or pseudowords) compared to 1-back matching on visually presented non-verbal stimuli (pictures of objects or non-objects), and (ii) overt speech production than 1-back matching, across 8 types of stimuli (visually presented words, pseudowords, objects and non-objects and aurally presented words, pseudowords, object sounds and meaningless hums). The response properties of the three regions dissociated within the auditory modality. In left TPJ, activation was higher for auditory stimuli that were non-verbal (sounds of objects or meaningless hums) compared to verbal (words and pseudowords), irrespective of task (speech production or 1-back matching). In left pSTS, activation was higher for non-semantic stimuli (pseudowords and hums) than semantic stimuli (words and object sounds) on the dorsal pSTS surface (dpSTS), irrespective of task. In left atSTS, activation was not sensitive to either semantic or verbal content. The contrasting response properties of left TPJ, dpSTS and atSTS was cross-validated in an independent sample of 59 participants, using region-by-condition interactions. We also show that each region participates in non-overlapping networks of frontal, parietal and cerebellar regions. Our results challenge previous claims about functional specialisation in the left posterior superior temporal lobe and motivate future studies to determine the timing and directionality of information flow in the brain networks involved in speech perception and production.
Collapse
Affiliation(s)
- Justyna O Ekert
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom.
| | - Andrea Gajardo-Vidal
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom; Faculty of Health Sciences, Universidad del Desarrollo, Concepcion, Chile
| | - Diego L Lorca-Puls
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom
| | - Thomas M H Hope
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom
| | - Fred Dick
- Department of Experimental Psychology, University College London, London, United Kingdom; Department of Psychological Sciences, Birkbeck University of London, London, United Kingdom
| | - Jennifer T Crinion
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - David W Green
- Department of Experimental Psychology, University College London, London, United Kingdom
| | - Cathy J Price
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, 12 Queen Square, London WC1N 3AR, United Kingdom
| |
Collapse
|
11
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
12
|
Jenson D. Audiovisual incongruence differentially impacts left and right hemisphere sensorimotor oscillations: Potential applications to production. PLoS One 2021; 16:e0258335. [PMID: 34618866 PMCID: PMC8496780 DOI: 10.1371/journal.pone.0258335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 09/26/2021] [Indexed: 11/21/2022] Open
Abstract
Speech production gives rise to distinct auditory and somatosensory feedback signals which are dynamically integrated to enable online monitoring and error correction, though it remains unclear how the sensorimotor system supports the integration of these multimodal signals. Capitalizing on the parity of sensorimotor processes supporting perception and production, the current study employed the McGurk paradigm to induce multimodal sensory congruence/incongruence. EEG data from a cohort of 39 typical speakers were decomposed with independent component analysis to identify bilateral mu rhythms; indices of sensorimotor activity. Subsequent time-frequency analyses revealed bilateral patterns of event related desynchronization (ERD) across alpha and beta frequency ranges over the time course of perceptual events. Right mu activity was characterized by reduced ERD during all cases of audiovisual incongruence, while left mu activity was attenuated and protracted in McGurk trials eliciting sensory fusion. Results were interpreted to suggest distinct hemispheric contributions, with right hemisphere mu activity supporting a coarse incongruence detection process and left hemisphere mu activity reflecting a more granular level of analysis including phonological identification and incongruence resolution. Findings are also considered in regard to incongruence detection and resolution processes during production.
Collapse
Affiliation(s)
- David Jenson
- Department of Speech and Hearing Sciences, Washington State University, Spokane, Washington, United States of America
| |
Collapse
|
13
|
Kaganovich N, Schumaker J, Christ S. Impaired Audiovisual Representation of Phonemes in Children with Developmental Language Disorder. Brain Sci 2021; 11:brainsci11040507. [PMID: 33923647 PMCID: PMC8073635 DOI: 10.3390/brainsci11040507] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/06/2021] [Accepted: 04/10/2021] [Indexed: 11/25/2022] Open
Abstract
We examined whether children with developmental language disorder (DLD) differed from their peers with typical development (TD) in the degree to which they encode information about a talker’s mouth shape into long-term phonemic representations. Children watched a talker’s face and listened to rare changes from [i] to [u] or the reverse. In the neutral condition, the talker’s face had a closed mouth throughout. In the audiovisual violation condition, the mouth shape always matched the frequent vowel, even when the rare vowel was played. We hypothesized that in the neutral condition no long-term audiovisual memory traces for speech sounds would be activated. Therefore, the neural response elicited by deviants would reflect only a violation of the observed audiovisual sequence. In contrast, we expected that in the audiovisual violation condition, a long-term memory trace for the speech sound/lip configuration typical for the frequent vowel would be activated. In this condition then, the neural response elicited by rare sound changes would reflect a violation of not only observed audiovisual patterns but also of a long-term memory representation for how a given vowel looks when articulated. Children pressed a response button whenever they saw a talker’s face assume a silly expression. We found that in children with TD, rare auditory changes produced a significant mismatch negativity (MMN) event-related potential (ERP) component over the posterior scalp in the audiovisual violation condition but not in the neutral condition. In children with DLD, no MMN was present in either condition. Rare vowel changes elicited a significant P3 in both groups and conditions, indicating that all children noticed auditory changes. Our results suggest that children with TD, but not children with DLD, incorporate visual information into long-term phonemic representations and detect violations in audiovisual phonemic congruency even when they perform a task that is unrelated to phonemic processing.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, USA;
- Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, USA
- Correspondence: ; Tel.: +1-(765)-494-4233; Fax: +1-(765)-494-0771
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, USA;
| | - Sharon Christ
- Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907-2066, USA;
- Department of Human Development and Family Studies, Purdue University, 1202 West State Street, West Lafayette, IN 47907-2055, USA
| |
Collapse
|
14
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
15
|
Narayan A, Rowe MA, Palacios EM, Wren-Jarvis J, Bourla I, Gerdes M, Brandes-Aitken A, Desai SS, Marco EJ, Mukherjee P. Altered Cerebellar White Matter in Sensory Processing Dysfunction Is Associated With Impaired Multisensory Integration and Attention. Front Psychol 2021; 11:618436. [PMID: 33613368 PMCID: PMC7888341 DOI: 10.3389/fpsyg.2020.618436] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 12/28/2020] [Indexed: 01/04/2023] Open
Abstract
Sensory processing dysfunction (SPD) is characterized by a behaviorally observed difference in the response to sensory information from the environment. While the cerebellum is involved in normal sensory processing, it has not yet been examined in SPD. Diffusion tensor imaging scans of children with SPD (n = 42) and typically developing controls (TDC; n = 39) were compared for fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), and axial diffusivity (AD) across the following cerebellar tracts: the middle cerebellar peduncles (MCP), superior cerebellar peduncles (SCP), and cerebral peduncles (CP). Compared to TDC, children with SPD show reduced microstructural integrity of the SCP and MCP, characterized by reduced FA and increased MD and RD, which correlates with abnormal auditory behavior, multisensory integration, and attention, but not tactile behavior or direct measures of auditory discrimination. In contradistinction, decreased CP microstructural integrity in SPD correlates with abnormal tactile and auditory behavior and direct measures of auditory discrimination, but not multisensory integration or attention. Hence, altered cerebellar white matter organization is associated with complex sensory behavior and attention in SPD, which prompts further consideration of diagnostic measures and treatments to better serve affected individuals.
Collapse
Affiliation(s)
- Anisha Narayan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States.,Department of Medicine, Tulane University School of Medicine, New Orleans, LA, United States
| | - Mikaela A Rowe
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States.,Cortica Healthcare, San Rafael, CA, United States
| | - Eva M Palacios
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States
| | - Jamie Wren-Jarvis
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States
| | - Ioanna Bourla
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States
| | - Molly Gerdes
- Cortica Healthcare, San Rafael, CA, United States
| | - Annie Brandes-Aitken
- Department of Neurology, University of California, San Francisco, San Francisco, CA, United States
| | - Shivani S Desai
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States
| | - Elysa J Marco
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States.,Cortica Healthcare, San Rafael, CA, United States
| | - Pratik Mukherjee
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
16
|
Csonka M, Mardmomen N, Webster PJ, Brefczynski-Lewis JA, Frum C, Lewis JW. Meta-Analyses Support a Taxonomic Model for Representations of Different Categories of Audio-Visual Interaction Events in the Human Brain. Cereb Cortex Commun 2021; 2:tgab002. [PMID: 33718874 PMCID: PMC7941256 DOI: 10.1093/texcom/tgab002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 12/31/2020] [Accepted: 01/06/2021] [Indexed: 01/23/2023] Open
Abstract
Our ability to perceive meaningful action events involving objects, people, and other animate agents is characterized in part by an interplay of visual and auditory sensory processing and their cross-modal interactions. However, this multisensory ability can be altered or dysfunctional in some hearing and sighted individuals, and in some clinical populations. The present meta-analysis sought to test current hypotheses regarding neurobiological architectures that may mediate audio-visual multisensory processing. Reported coordinates from 82 neuroimaging studies (137 experiments) that revealed some form of audio-visual interaction in discrete brain regions were compiled, converted to a common coordinate space, and then organized along specific categorical dimensions to generate activation likelihood estimate (ALE) brain maps and various contrasts of those derived maps. The results revealed brain regions (cortical "hubs") preferentially involved in multisensory processing along different stimulus category dimensions, including 1) living versus nonliving audio-visual events, 2) audio-visual events involving vocalizations versus actions by living sources, 3) emotionally valent events, and 4) dynamic-visual versus static-visual audio-visual stimuli. These meta-analysis results are discussed in the context of neurocomputational theories of semantic knowledge representations and perception, and the brain volumes of interest are available for download to facilitate data interpretation for future neuroimaging studies.
Collapse
Affiliation(s)
- Matt Csonka
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Nadia Mardmomen
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Paula J Webster
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Julie A Brefczynski-Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Chris Frum
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - James W Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
17
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
18
|
Reduced resting state functional connectivity with increasing age-related hearing loss and McGurk susceptibility. Sci Rep 2020; 10:16987. [PMID: 33046800 PMCID: PMC7550565 DOI: 10.1038/s41598-020-74012-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 09/15/2020] [Indexed: 11/21/2022] Open
Abstract
Age-related hearing loss has been related to a compensatory increase in audio-visual integration and neural reorganization including alterations in functional resting state connectivity. How these two changes are linked in elderly listeners is unclear. The current study explored modulatory effects of hearing thresholds and audio-visual integration on resting state functional connectivity. We analysed a large set of resting state data of 65 elderly participants with a widely varying degree of untreated hearing loss. Audio-visual integration, as gauged with the McGurk effect, increased with progressing hearing thresholds. On the neural level, McGurk illusions were negatively related to functional coupling between motor and auditory regions. Similarly, connectivity of the dorsal attention network to sensorimotor and primary motor cortices was reduced with increasing hearing loss. The same effect was obtained for connectivity between the salience network and visual cortex. Our findings suggest that with progressing untreated age-related hearing loss, functional coupling at rest declines, affecting connectivity of brain networks and areas associated with attentional, visual, sensorimotor and motor processes. Especially connectivity reductions between auditory and motor areas were related to stronger audio-visual integration found with increasing hearing loss.
Collapse
|
19
|
Michaelis K, Erickson LC, Fama ME, Skipper-Kallal LM, Xing S, Lacey EH, Anbari Z, Norato G, Rauschecker JP, Turkeltaub PE. Effects of age and left hemisphere lesions on audiovisual integration of speech. BRAIN AND LANGUAGE 2020; 206:104812. [PMID: 32447050 PMCID: PMC7379161 DOI: 10.1016/j.bandl.2020.104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/02/2020] [Accepted: 05/04/2020] [Indexed: 06/11/2023]
Abstract
Neuroimaging studies have implicated left temporal lobe regions in audiovisual integration of speech and inferior parietal regions in temporal binding of incoming signals. However, it remains unclear which regions are necessary for audiovisual integration, especially when the auditory and visual signals are offset in time. Aging also influences integration, but the nature of this influence is unresolved. We used a McGurk task to test audiovisual integration and sensitivity to the timing of audiovisual signals in two older adult groups: left hemisphere stroke survivors and controls. We observed a positive relationship between age and audiovisual speech integration in both groups, and an interaction indicating that lesions reduce sensitivity to timing offsets between signals. Lesion-symptom mapping demonstrated that damage to the left supramarginal gyrus and planum temporale reduces temporal acuity in audiovisual speech perception. This suggests that a process mediated by these structures identifies asynchronous audiovisual signals that should not be integrated.
Collapse
Affiliation(s)
- Kelly Michaelis
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Laura C Erickson
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Mackenzie E Fama
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Speech-Language Pathology & Audiology, Towson University, Towson, MD, USA
| | - Laura M Skipper-Kallal
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Shihui Xing
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Neurology, First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Elizabeth H Lacey
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA
| | - Zainab Anbari
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Gina Norato
- Clinical Trials Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Josef P Rauschecker
- Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Peter E Turkeltaub
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA.
| |
Collapse
|
20
|
Age-related hearing loss influences functional connectivity of auditory cortex for the McGurk illusion. Cortex 2020; 129:266-280. [PMID: 32535378 DOI: 10.1016/j.cortex.2020.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/30/2020] [Accepted: 04/09/2020] [Indexed: 01/23/2023]
Abstract
Age-related hearing loss affects hearing at high frequencies and is associated with difficulties in understanding speech. Increased audio-visual integration has recently been found in age-related hearing impairment, the brain mechanisms that contribute to this effect are however unclear. We used functional magnetic resonance imaging in elderly subjects with normal hearing and mild to moderate uncompensated hearing loss. Audio-visual integration was studied using the McGurk task. In this task, an illusionary fused percept can occur if incongruent auditory and visual syllables are presented. The paradigm included unisensory stimuli (auditory only, visual only), congruent audio-visual and incongruent (McGurk) audio-visual stimuli. An illusionary precept was reported in over 60% of incongruent trials. These McGurk illusion rates were equal in both groups of elderly subjects and correlated positively with speech-in-noise perception and daily listening effort. Normal-hearing participants showed an increased neural response in left pre- and postcentral gyri and right middle frontal gyrus for incongruent stimuli (McGurk) compared to congruent audio-visual stimuli. Activation patterns were however not different between groups. Task-modulated functional connectivity differed between groups showing increased connectivity from auditory cortex to visual, parietal and frontal areas in hard of hearing participants as compared to normal-hearing participants when comparing incongruent stimuli (McGurk) with congruent audio-visual stimuli. These results suggest that changes in functional connectivity of auditory cortex rather than activation strength during processing of audio-visual McGurk stimuli accompany age-related hearing loss.
Collapse
|
21
|
Dollack F, Perusquía-Hernández M, Kadone H, Suzuki K. Head Anticipation During Locomotion With Auditory Instruction in the Presence and Absence of Visual Input. Front Hum Neurosci 2019; 13:293. [PMID: 31555112 PMCID: PMC6724718 DOI: 10.3389/fnhum.2019.00293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 08/12/2019] [Indexed: 11/13/2022] Open
Abstract
Head direction has been identified to anticipate trajectory direction during human locomotion. Head anticipation has also been shown to persist in darkness. Arguably, the purpose for this anticipatory behavior is related to motor control and trajectory planning, independently of the visual condition. This implies that anticipation remains in the absence of visual input. However, experiments so far have only explored this phenomenon with visual instructions which intrinsically primes a visual representation to follow. The primary objective of this study is to describe head anticipation in auditory instructed locomotion, in the presence and absence of visual input. Auditory instructed locomotion trajectories were performed in two visual conditions: eyes open and eyes closed. First, 10 sighted participants localized static sound sources to ensure they could understand the sound cues provided. Afterwards, they listened to a moving sound source while actively following it. Later, participants were asked to reproduce the trajectory of the moving sound source without sound. Anticipatory head behavior was observed during trajectory reproduction in both eyes open and closed conditions. The results suggest that head anticipation is related to motor anticipation rather than mental simulation of the trajectory.
Collapse
Affiliation(s)
- Felix Dollack
- School of Integrative and Global Majors, University of Tsukuba, Tsukuba, Japan.,Artificial Intelligence Laboratory, University of Tsukuba, Tsukuba, Japan
| | | | - Hideki Kadone
- Artificial Intelligence Laboratory, University of Tsukuba, Tsukuba, Japan.,Center for Innovative Medicine and Engineering, University of Tsukuba Hospital, Tsukuba, Japan.,Center for Cybernics Research, University of Tsukuba, Tsukuba, Japan
| | - Kenji Suzuki
- Artificial Intelligence Laboratory, University of Tsukuba, Tsukuba, Japan.,Center for Cybernics Research, University of Tsukuba, Tsukuba, Japan.,Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
22
|
Kaganovich N, Ancel E. Different neural processes underlie visual speech perception in school-age children and adults: An event-related potentials study. J Exp Child Psychol 2019; 184:98-122. [PMID: 31015101 PMCID: PMC6857813 DOI: 10.1016/j.jecp.2019.03.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 03/15/2019] [Accepted: 03/26/2019] [Indexed: 11/18/2022]
Abstract
The ability to use visual speech cues does not fully develop until late adolescence. The cognitive and neural processes underlying this slow maturation are not yet understood. We examined electrophysiological responses of younger (8-9 years) and older (11-12 years) children as well as adults elicited by visually perceived articulations in an audiovisual word matching task and related them to the amount of benefit gained during a speech-in-noise (SIN) perception task when seeing the talker's face. On each trial, participants first heard a word and, after a short pause, saw a speaker silently articulate a word. In half of the trials the articulated word matched the auditory word (congruent trials), whereas in the other half it did not (incongruent trials). In all three age groups, incongruent articulations elicited the N400 component and congruent articulations elicited the late positive complex (LPC). Groups did not differ in the mean amplitude of N400. The mean amplitude of LPC was larger in younger children compared with older children and adults. Importantly, the relationship between event-related potential measures and SIN performance varied by group. In 8- and 9-year-olds, neither component was predictive of SIN gain. The LPC amplitude predicted the SIN gain in older children but not in adults. Conversely, the N400 amplitude predicted the SIN gain in adults. We argue that although all groups were able to detect correspondences between auditory and visual word onsets at the phonemic/syllabic level, only adults could use this information for lexical access.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907, USA.
| | - Elizabeth Ancel
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
23
|
"Paying" attention to audiovisual speech: Do incongruent stimuli incur greater costs? Atten Percept Psychophys 2019; 81:1743-1756. [PMID: 31197661 DOI: 10.3758/s13414-019-01772-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing - susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.
Collapse
|
24
|
Abstract
Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research as well as new findings related to somatosensory effects. Cross-modal effects in speech perception to date are found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. Far from taking place in a one-, two- or even three-dimensional space, the literature reveals that speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.
Collapse
Affiliation(s)
- Megan Keough
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Donald Derrick
- New Zealand Institute of Brain and Behaviour, University of Canterbury, Christchurch 8140, New Zealand
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Bryan Gick
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Haskins Laboratories, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
25
|
Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, Strand JF. What accounts for individual differences in susceptibility to the McGurk effect? PLoS One 2018; 13:e0207160. [PMID: 30418995 PMCID: PMC6231656 DOI: 10.1371/journal.pone.0207160] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/25/2018] [Indexed: 11/29/2022] Open
Abstract
The McGurk effect is a classic audiovisual speech illusion in which discrepant auditory and visual syllables can lead to a fused percept (e.g., an auditory /bɑ/ paired with a visual /gɑ/ often leads to the perception of /dɑ/). The McGurk effect is robust and easily replicated in pooled group data, but there is tremendous variability in the extent to which individual participants are susceptible to it. In some studies, the rate at which individuals report fusion responses ranges from 0% to 100%. Despite its widespread use in the audiovisual speech perception literature, the roots of the wide variability in McGurk susceptibility are largely unknown. This study evaluated whether several perceptual and cognitive traits are related to McGurk susceptibility through correlational analyses and mixed effects modeling. We found that an individual's susceptibility to the McGurk effect was related to their ability to extract place of articulation information from the visual signal (i.e., a more fine-grained analysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency. These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill. In contrast, cognitive and perceptual abilities that are commonly used predictors in individual differences studies do not appear to underlie susceptibility to the McGurk effect.
Collapse
Affiliation(s)
- Violet A. Brown
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Maryam Hedayati
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Annie Zanger
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Sasha Mayn
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Lucia Ray
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Naseem Dillman-Hasso
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Julia F. Strand
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| |
Collapse
|
26
|
Abbott NT, Shahin AJ. Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration. J Neurophysiol 2018; 120:2988-3000. [PMID: 30303762 DOI: 10.1152/jn.00262.2018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In spoken language, audiovisual (AV) perception occurs when the visual modality influences encoding of acoustic features (e.g., phonetic representations) at the auditory cortex. We examined how visual speech (mouth movements) transforms phonetic representations, indexed by changes to the N1 auditory evoked potential (AEP). EEG was acquired while human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables, /ba/ and /wa/, presented in auditory-only or AV congruent or incongruent contexts or in a context in which the consonants were replaced by white noise (noise replaced). Subjects reported whether they heard "ba" or "wa." We hypothesized that the auditory N1 amplitude during illusory perception (caused by incongruent AV input, as in the McGurk illusion, or white noise-replaced consonants in CV utterances) should shift to reflect the auditory N1 characteristics of the phonemes conveyed visually (by mouth movements) as opposed to acoustically. Indeed, the N1 AEP became larger and occurred earlier when listeners experienced illusory "ba" (video /ba/, audio /wa/, heard as "ba") and vice versa when they experienced illusory "wa" (video /wa/, audio /ba/, heard as "wa"), mirroring the N1 AEP characteristics for /ba/ and /wa/ observed in natural acoustic situations (e.g., auditory-only setting). This visually mediated N1 behavior was also observed for noise-replaced CVs. Taken together, the findings suggest that information relayed by the visual modality modifies phonetic representations at the auditory cortex and that similar neural mechanisms support the McGurk illusion and visually mediated phonemic restoration. NEW & NOTEWORTHY Using a variant of the McGurk illusion experimental design (using the syllables /ba/ and /wa/), we demonstrate that lipreading influences phonetic encoding at the auditory cortex. We show that the N1 auditory evoked potential morphology shifts to resemble the N1 morphology of the syllable conveyed visually. We also show similar N1 shifts when the consonants are replaced by white noise, suggesting that the McGurk illusion and the visually mediated phonemic restoration rely on common mechanisms.
Collapse
Affiliation(s)
- Noelle T Abbott
- Center for Mind and Brain, University of California, Davis, California.,San Diego State University-University of California, San Diego Joint Doctoral Program in Language and Communicative Disorders, San Diego, California
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, California.,Department of Cognitive and Information Sciences, University of California, Merced, California
| |
Collapse
|
27
|
Rauschecker JP. Where did language come from? Precursor mechanisms in nonhuman primates. Curr Opin Behav Sci 2018; 21:195-204. [PMID: 30778394 PMCID: PMC6377164 DOI: 10.1016/j.cobeha.2018.06.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
At first glance, the monkey brain looks like a smaller version of the human brain. Indeed, the anatomical and functional architecture of the cortical auditory system in monkeys is very similar to that of humans, with dual pathways segregated into a ventral and a dorsal processing stream. Yet, monkeys do not speak. Repeated attempts to pin this inability on one particular cause have failed. A closer look at the necessary components of language, according to Darwin, reveals that all of them got a significant boost during evolution from nonhuman to human primates. The vocal-articulatory system, in particular, has developed into the most sophisticated of all human sensorimotor systems with about a dozen effectors that, in combination with each other, result in an auditory communication system like no other. This sensorimotor network possesses all the ingredients of an internal model system that permits the emergence of sequence processing, as required for phonology and syntax in modern languages.
Collapse
Affiliation(s)
- Josef P Rauschecker
- Department of Neuroscience, Georgetown University, Washington, DC 20057, USA
| |
Collapse
|
28
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
29
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
30
|
Cardon GJ, Hepburn S, Rojas DC. Structural Covariance of Sensory Networks, the Cerebellum, and Amygdala in Autism Spectrum Disorder. Front Neurol 2017; 8:615. [PMID: 29230189 PMCID: PMC5712069 DOI: 10.3389/fneur.2017.00615] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 11/03/2017] [Indexed: 11/13/2022] Open
Abstract
Sensory dysfunction is a core symptom of autism spectrum disorder (ASD), and abnormalities with sensory responsivity and processing can be extremely debilitating to ASD patients and their families. However, relatively little is known about the underlying neuroanatomical and neurophysiological factors that lead to sensory abnormalities in ASD. Investigation into these aspects of ASD could lead to significant advancements in our general knowledge about ASD, as well as provide targets for treatment and inform diagnostic procedures. Thus, the current study aimed to measure the covariation of volumes of brain structures (i.e., structural magnetic resonance imaging) that may be involved in abnormal sensory processing, in order to infer connectivity of these brain regions. Specifically, we quantified the structural covariation of sensory-related cerebral cortical structures, in addition to the cerebellum and amygdala by computing partial correlations between the structural volumes of these structures. These analyses were performed in participants with ASD (n = 36), as well as typically developing peers (n = 32). Results showed decreased structural covariation between sensory-related cortical structures, especially between the left and right cerebral hemispheres, in participants with ASD. In contrast, these same participants presented with increased structural covariation of structures in the right cerebral hemisphere. Additionally, sensory-related cerebral structures exhibited decreased structural covariation with functionally identified cerebellar networks. Also, the left amygdala showed significantly increased structural covariation with cerebral structures related to visual processing. Taken together, these results may suggest several patterns of altered connectivity both within and between cerebral cortices and other brain structures that may be related to sensory processing.
Collapse
Affiliation(s)
- Garrett J Cardon
- Department of Psychology, Colorado State University, Fort Collins, CO, United States
| | - Susan Hepburn
- Department of Human Development and Family Studies, Colorado State University, Fort Collins, CO, United States
| | - Donald C Rojas
- Department of Psychology, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
31
|
Cuppini C, Ursino M, Magosso E, Ross LA, Foxe JJ, Molholm S. A Computational Analysis of Neural Mechanisms Underlying the Maturation of Multisensory Speech Integration in Neurotypical Children and Those on the Autism Spectrum. Front Hum Neurosci 2017; 11:518. [PMID: 29163099 PMCID: PMC5670153 DOI: 10.3389/fnhum.2017.00518] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 10/11/2017] [Indexed: 11/13/2022] Open
Abstract
Failure to appropriately develop multisensory integration (MSI) of audiovisual speech may affect a child's ability to attain optimal communication. Studies have shown protracted development of MSI into late-childhood and identified deficits in MSI in children with an autism spectrum disorder (ASD). Currently, the neural basis of acquisition of this ability is not well understood. Here, we developed a computational model informed by neurophysiology to analyze possible mechanisms underlying MSI maturation, and its delayed development in ASD. The model posits that strengthening of feedforward and cross-sensory connections, responsible for the alignment of auditory and visual speech sound representations in posterior superior temporal gyrus/sulcus, can explain behavioral data on the acquisition of MSI. This was simulated by a training phase during which the network was exposed to unisensory and multisensory stimuli, and projections were crafted by Hebbian rules of potentiation and depression. In its mature architecture, the network also reproduced the well-known multisensory McGurk speech effect. Deficits in audiovisual speech perception in ASD were well accounted for by fewer multisensory exposures, compatible with a lack of attention, but not by reduced synaptic connectivity or synaptic plasticity.
Collapse
Affiliation(s)
- Cristiano Cuppini
- Department of Electric, Electronic and Information Engineering, University of Bologna, Bologna, Italy
| | - Mauro Ursino
- Department of Electric, Electronic and Information Engineering, University of Bologna, Bologna, Italy
| | - Elisa Magosso
- Department of Electric, Electronic and Information Engineering, University of Bologna, Bologna, Italy
| | - Lars A. Ross
- Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, United States
| | - John J. Foxe
- Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, United States
- Department of Neuroscience and The Del Monte Institute for Neuroscience, University of Rochester School of Medicine, Rochester, NY, United States
| | - Sophie Molholm
- Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, United States
| |
Collapse
|
32
|
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect. Atten Percept Psychophys 2017; 79:396-403. [PMID: 27921268 DOI: 10.3758/s13414-016-1238-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.
Collapse
|
33
|
Sight and sound persistently out of synch: stable individual differences in audiovisual synchronisation revealed by implicit measures of lip-voice integration. Sci Rep 2017; 7:46413. [PMID: 28429784 PMCID: PMC5399466 DOI: 10.1038/srep46413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 03/17/2017] [Indexed: 11/08/2022] Open
Abstract
Are sight and sound out of synch? Signs that they are have been dismissed for over two centuries as an artefact of attentional and response bias, to which traditional subjective methods are prone. To avoid such biases, we measured performance on objective tasks that depend implicitly on achieving good lip-synch. We measured the McGurk effect (in which incongruent lip-voice pairs evoke illusory phonemes), and also identification of degraded speech, while manipulating audiovisual asynchrony. Peak performance was found at an average auditory lag of ~100 ms, but this varied widely between individuals. Participants’ individual optimal asynchronies showed trait-like stability when the same task was re-tested one week later, but measures based on different tasks did not correlate. This discounts the possible influence of common biasing factors, suggesting instead that our different tasks probe different brain networks, each subject to their own intrinsic auditory and visual processing latencies. Our findings call for renewed interest in the biological causes and cognitive consequences of individual sensory asynchronies, leading potentially to fresh insights into the neural representation of sensory timing. A concrete implication is that speech comprehension might be enhanced, by first measuring each individual’s optimal asynchrony and then applying a compensatory auditory delay.
Collapse
|
34
|
Ozker M, Schepers IM, Magnotti JF, Yoshor D, Beauchamp MS. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 2017; 29:1044-1060. [PMID: 28253074 DOI: 10.1162/jocn_a_01110] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Collapse
Affiliation(s)
- Muge Ozker
- 1 University of Texas Graduate School of Biomedical Sciences at Houston.,2 Baylor College of Medicine
| | | | | | | | | |
Collapse
|
35
|
Komeilipoor N, Cesari P, Daffertshofer A. Involvement of superior temporal areas in audiovisual and audiomotor speech integration. Neuroscience 2017; 343:276-283. [PMID: 27019129 DOI: 10.1016/j.neuroscience.2016.03.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Revised: 03/16/2016] [Accepted: 03/16/2016] [Indexed: 11/25/2022]
Abstract
Perception of speech sounds is affected by observing facial motion. Incongruence between speech sounds and watching somebody articulating may influence the perception of auditory syllable, referred to as the McGurk effect. We tested the degree to which silent articulation of a syllable also affects speech perception and searched for its neural correlates. Listeners were instructed to identify the auditory syllables /pa/ and /ta/ while silently articulating congruent/incongruent syllables or observing videos of a speaker's face articulating them. As a baseline, we included an auditory-only condition without competing visual or sensorimotor input. As expected, perception of sounds degraded when incongruent syllables were observed, and also when they were silently articulated, albeit to a lesser extent. This degrading was accompanied by significant amplitude modulations in the beta frequency band in right superior temporal areas. In these areas, the event-related beta activity during congruent conditions was phase-locked to responses evoked during the auditory-only condition. We conclude that proper temporal alignment of different input streams in right superior temporal areas is mandatory for both audiovisual and audiomotor speech integration.
Collapse
Affiliation(s)
- N Komeilipoor
- MOVE Research Institute Amsterdam, Faculty of Behavioural and Movement Sciences, Vrije Universiteit, Van der Boechorststraat 9, 1081BT Amsterdam, The Netherlands; Department of Neurological, Biomedical and Movement Sciences, University of Verona, 37131 Verona, Italy
| | - P Cesari
- Department of Neurological, Biomedical and Movement Sciences, University of Verona, 37131 Verona, Italy
| | - A Daffertshofer
- MOVE Research Institute Amsterdam, Faculty of Behavioural and Movement Sciences, Vrije Universiteit, Van der Boechorststraat 9, 1081BT Amsterdam, The Netherlands.
| |
Collapse
|
36
|
Atypical audiovisual word processing in school-age children with a history of specific language impairment: an event-related potential study. J Neurodev Disord 2016; 8:33. [PMID: 27597881 PMCID: PMC5011345 DOI: 10.1186/s11689-016-9168-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 08/17/2016] [Indexed: 11/12/2022] Open
Abstract
Background Visual speech cues influence different aspects of language acquisition. However, whether developmental language disorders may be associated with atypical processing of visual speech is unknown. In this study, we used behavioral and ERP measures to determine whether children with a history of SLI (H-SLI) differ from their age-matched typically developing (TD) peers in the ability to match auditory words with corresponding silent visual articulations. Methods Nineteen 7–13-year-old H-SLI children and 19 age-matched TD children participated in the study. Children first heard a word and then saw a speaker silently articulating a word. In half of trials, the articulated word matched the auditory word (congruent trials), while in another half, it did not (incongruent trials). Children specified whether the auditory and the articulated words matched. We examined ERPs elicited by the onset of visual stimuli (visual P1, N1, and P2) as well as ERPs elicited by the articulatory movements themselves—namely, N400 to incongruent articulations and late positive complex (LPC) to congruent articulations. We also examined whether ERP measures of visual speech processing could predict (1) children’s linguistic skills and (2) the use of visual speech cues when listening to speech-in-noise (SIN). Results H-SLI children were less accurate in matching auditory words with visual articulations. They had a significantly reduced P1 to the talker’s face and a smaller N400 to incongruent articulations. In contrast, congruent articulations elicited LPCs of similar amplitude in both groups of children. The P1 and N400 amplitude was significantly correlated with accuracy enhancement on the SIN task when seeing the talker’s face. Conclusions H-SLI children have poorly defined correspondences between speech sounds and visually observed articulatory movements that produce them.
Collapse
|
37
|
Araneda R, Renier L, Ebner-Karestinos D, Dricot L, De Volder AG. Hearing, feeling or seeing a beat recruits a supramodal network in the auditory dorsal stream. Eur J Neurosci 2016; 45:1439-1450. [PMID: 27471102 DOI: 10.1111/ejn.13349] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 06/13/2016] [Accepted: 07/23/2016] [Indexed: 10/21/2022]
Abstract
Hearing a beat recruits a wide neural network that involves the auditory cortex and motor planning regions. Perceiving a beat can potentially be achieved via vision or even touch, but it is currently not clear whether a common neural network underlies beat processing. Here, we used functional magnetic resonance imaging (fMRI) to test to what extent the neural network involved in beat processing is supramodal, that is, is the same in the different sensory modalities. Brain activity changes in 27 healthy volunteers were monitored while they were attending to the same rhythmic sequences (with and without a beat) in audition, vision and the vibrotactile modality. We found a common neural network for beat detection in the three modalities that involved parts of the auditory dorsal pathway. Within this network, only the putamen and the supplementary motor area (SMA) showed specificity to the beat, while the brain activity in the putamen covariated with the beat detection speed. These results highlighted the implication of the auditory dorsal stream in beat detection, confirmed the important role played by the putamen in beat detection and indicated that the neural network for beat detection is mostly supramodal. This constitutes a new example of convergence of the same functional attributes into one centralized representation in the brain.
Collapse
Affiliation(s)
- Rodrigo Araneda
- Université catholique de Louvain, 54 Avenue Hippocrate UCL B1.54.09, 1200, Brussels, Belgium
| | - Laurent Renier
- Université catholique de Louvain, 54 Avenue Hippocrate UCL B1.54.09, 1200, Brussels, Belgium
| | | | - Laurence Dricot
- Université catholique de Louvain, 54 Avenue Hippocrate UCL B1.54.09, 1200, Brussels, Belgium
| | - Anne G De Volder
- Université catholique de Louvain, 54 Avenue Hippocrate UCL B1.54.09, 1200, Brussels, Belgium
| |
Collapse
|
38
|
Kaganovich N, Schumaker J, Rowland C. Matching heard and seen speech: An ERP study of audiovisual word recognition. BRAIN AND LANGUAGE 2016; 157-158:14-24. [PMID: 27155219 PMCID: PMC4915735 DOI: 10.1016/j.bandl.2016.04.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 03/23/2016] [Accepted: 04/10/2016] [Indexed: 06/05/2023]
Abstract
Seeing articulatory gestures while listening to speech-in-noise (SIN) significantly improves speech understanding. However, the degree of this improvement varies greatly among individuals. We examined a relationship between two distinct stages of visual articulatory processing and the SIN accuracy by combining a cross-modal repetition priming task with ERP recordings. Participants first heard a word referring to a common object (e.g., pumpkin) and then decided whether the subsequently presented visual silent articulation matched the word they had just heard. Incongruent articulations elicited a significantly enhanced N400, indicative of a mismatch detection at the pre-lexical level. Congruent articulations elicited a significantly larger LPC, indexing articulatory word recognition. Only the N400 difference between incongruent and congruent trials was significantly correlated with individuals' SIN accuracy improvement in the presence of the talker's face.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States; Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, United States.
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| | - Courtney Rowland
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| |
Collapse
|
39
|
Kaganovich N, Schumaker J, Macias D, Gustafson D. Processing of audiovisually congruent and incongruent speech in school-age children with a history of specific language impairment: a behavioral and event-related potentials study. Dev Sci 2015; 18:751-70. [PMID: 25440407 PMCID: PMC4449323 DOI: 10.1111/desc.12263] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 09/07/2014] [Indexed: 11/30/2022]
Abstract
Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we examined perception of both audiovisually congruent and audiovisually incongruent speech in school-age children with a history of SLI (H-SLI), their typically developing (TD) peers, and adults. In the first experiment, all participants watched videos of a talker articulating syllables 'ba', 'da', and 'ga' under three conditions - audiovisual (AV), auditory only (A), and visual only (V). The amplitude of the N1 (but not of the P2) event-related component elicited in the AV condition was significantly reduced compared to the N1 amplitude measured from the sum of the A and V conditions in all groups of participants. Because N1 attenuation to AV speech is thought to index the degree to which facial movements predict the onset of the auditory signal, our findings suggest that this aspect of audiovisual speech perception is mature by mid-childhood and is normal in the H-SLI children. In the second experiment, participants watched videos of audivisually incongruent syllables created to elicit the so-called McGurk illusion (with an auditory 'pa' dubbed onto a visual articulation of 'ka', and the expectant perception being that of 'ta' if audiovisual integration took place). As a group, H-SLI children were significantly more likely than either TD children or adults to hear the McGurk syllable as 'pa' (in agreement with its auditory component) than as 'ka' (in agreement with its visual component), suggesting that susceptibility to the McGurk illusion is reduced in at least some children with a history of SLI. Taken together, the results of the two experiments argue against global audiovisual integration impairment in children with a history of SLI and suggest that, when present, audiovisual integration difficulties in this population likely stem from a later (non-sensory) stage of processing.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038
- Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038
| | - Danielle Macias
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038
| | - Dana Gustafson
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038
| |
Collapse
|
40
|
Tiippana K, Möttönen R, Schwartz JL. Multisensory and sensorimotor interactions in speech perception. Front Psychol 2015; 6:458. [PMID: 25941506 PMCID: PMC4403297 DOI: 10.3389/fpsyg.2015.00458] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 03/30/2015] [Indexed: 11/29/2022] Open
Affiliation(s)
- Kaisa Tiippana
- Institute of Behavioural Sciences, University of Helsinki Helsinki, Finland
| | - Riikka Möttönen
- Department of Experimental Psychology, University of Oxford Oxford, UK
| | - Jean-Luc Schwartz
- Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, Centre National de la Recherche Scientifique, Grenoble University Grenoble, France
| |
Collapse
|
41
|
Prediction across sensory modalities: A neurocomputational model of the McGurk effect. Cortex 2015; 68:61-75. [PMID: 26009260 DOI: 10.1016/j.cortex.2015.04.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 02/17/2015] [Accepted: 04/14/2015] [Indexed: 01/22/2023]
Abstract
The McGurk effect is a textbook illustration of the automaticity with which the human brain integrates audio-visual speech. It shows that even incongruent audiovisual (AV) speech stimuli can be combined into percepts that correspond neither to the auditory nor to the visual input, but to a mix of both. Typically, when presented with, e.g., visual /aga/ and acoustic /aba/ we perceive an illusory /ada/. In the inverse situation, however, when acoustic /aga/ is paired with visual /aba/, we perceive a combination of both stimuli, i.e., /abga/ or /agba/. Here we assessed the role of dynamic cross-modal predictions in the outcome of AV speech integration using a computational model that processes continuous audiovisual speech sensory inputs in a predictive coding framework. The model involves three processing levels: sensory units, units that encode the dynamics of stimuli, and multimodal recognition/identity units. The model exhibits a dynamic prediction behavior because evidence about speech tokens can be asynchronous across sensory modality, allowing for updating the activity of the recognition units from one modality while sending top-down predictions to the other modality. We explored the model's response to congruent and incongruent AV stimuli and found that, in the two-dimensional feature space spanned by the speech second formant and lip aperture, fusion stimuli are located in the neighborhood of congruent /ada/, which therefore provides a valid match. Conversely, stimuli that lead to combination percepts do not have a unique valid neighbor. In that case, acoustic and visual cues are both highly salient and generate conflicting predictions in the other modality that cannot be fused, forcing the elaboration of a combinatorial solution. We propose that dynamic predictive mechanisms play a decisive role in the dichotomous perception of incongruent audiovisual inputs.
Collapse
|