1
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
2
|
Saito H, Tiede M, Whalen DH, Ménard L. The effect of native language and bilingualism on multimodal perception in speech: A study of audio-aerotactile integrationa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2209-2220. [PMID: 38526052 PMCID: PMC10965246 DOI: 10.1121/10.0025381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/22/2024] [Accepted: 02/27/2024] [Indexed: 03/26/2024]
Abstract
Previous studies of speech perception revealed that tactile sensation can be integrated into the perception of stop consonants. It remains uncertain whether such multisensory integration can be shaped by linguistic experience, such as the listener's native language(s). This study investigates audio-aerotactile integration in phoneme perception for English and French monolinguals as well as English-French bilingual listeners. Six step voice onset time continua of alveolar (/da/-/ta/) and labial (/ba/-/pa/) stops constructed from both English and French end points were presented to listeners who performed a forced-choice identification task. Air puffs were synchronized to syllable onset and randomly applied to the back of the hand. Results show that stimuli with an air puff elicited more "voiceless" responses for the /da/-/ta/ continuum by both English and French listeners. This suggests that audio-aerotactile integration can occur even though the French listeners did not have an aspiration/non-aspiration contrast in their native language. Furthermore, bilingual speakers showed larger air puff effects compared to monolinguals in both languages, perhaps due to bilinguals' heightened receptiveness to multimodal information in speech.
Collapse
Affiliation(s)
- Haruka Saito
- Département de Linguistique, Université du Québec à Montréal, Montréal, Québec H2L2C5, Canada
| | - Mark Tiede
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06520, USA
| | - D H Whalen
- The Graduate Center, City University of New York (CUNY), New York, New York 10016, USA
- Yale Child Study Center, New Haven, Connecticut 06520, USA
| | - Lucie Ménard
- Département de Linguistique, Université du Québec à Montréal, Montréal, Québec H2L2C5, Canada
| |
Collapse
|
3
|
Choi I, Demir I, Oh S, Lee SH. Multisensory integration in the mammalian brain: diversity and flexibility in health and disease. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220338. [PMID: 37545309 PMCID: PMC10404930 DOI: 10.1098/rstb.2022.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/30/2023] [Indexed: 08/08/2023] Open
Abstract
Multisensory integration (MSI) occurs in a variety of brain areas, spanning cortical and subcortical regions. In traditional studies on sensory processing, the sensory cortices have been considered for processing sensory information in a modality-specific manner. The sensory cortices, however, send the information to other cortical and subcortical areas, including the higher association cortices and the other sensory cortices, where the multiple modality inputs converge and integrate to generate a meaningful percept. This integration process is neither simple nor fixed because these brain areas interact with each other via complicated circuits, which can be modulated by numerous internal and external conditions. As a result, dynamic MSI makes multisensory decisions flexible and adaptive in behaving animals. Impairments in MSI occur in many psychiatric disorders, which may result in an altered perception of the multisensory stimuli and an abnormal reaction to them. This review discusses the diversity and flexibility of MSI in mammals, including humans, primates and rodents, as well as the brain areas involved. It further explains how such flexibility influences perceptual experiences in behaving animals in both health and disease. This article is part of the theme issue 'Decision and control processes in multisensory perception'.
Collapse
Affiliation(s)
- Ilsong Choi
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science (IBS), Daejeon 34141, Republic of Korea
| | - Ilayda Demir
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| | - Seungmi Oh
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| | - Seung-Hee Lee
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science (IBS), Daejeon 34141, Republic of Korea
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| |
Collapse
|
4
|
Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023; 13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one's own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGurk stimulus A/pa/V/ka/. The McGurk effect was stronger with Japanese stimuli in both groups. Differences in speech perception were prominent between individual speakers but less so between native languages. Unisensory perception correlated with McGurk perception. These findings suggest that stimulus-dependent features contribute to the McGurk effect. This may have a stronger influence on syllable perception than cross-language factors.
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Yuta Ujiie
- Department of Psychology, College of Contemporary Psychology, Rikkyo University, Saitama 352-8558, Japan
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Osaka 567-8570, Japan
| | - Tarja Peromaa
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Kohske Takahashi
- College of Comprehensive Psychology, Ritsumeikan University, Osaka 567-8570, Japan
| |
Collapse
|
5
|
Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023; 16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open
Abstract
We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Collapse
Affiliation(s)
- Victoria L. Fisher
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Yale University School of Medicine and the Connecticut Mental Health Center, New Haven, CT, United States
| | - Cassandra L. Dean
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Roche/Genentech Neurodevelopment & Psychiatry Teams Product Development, Neuroscience, South San Francisco, CA, United States
| | - Claire S. Nave
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| | - Emma V. Parkins
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Neuroscience Graduate Program, University of Cincinnati, Cincinnati, OH, United States
| | - Willa G. Kerkhoff
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Leslie D. Kwakye
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,*Correspondence: Leslie D. Kwakye,
| |
Collapse
|
6
|
Butera IM, Stevenson RA, Gifford RH, Wallace MT. Visually biased Perception in Cochlear Implant Users: A Study of the McGurk and Sound-Induced Flash Illusions. Trends Hear 2023; 27:23312165221076681. [PMID: 37377212 PMCID: PMC10334005 DOI: 10.1177/23312165221076681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 12/08/2021] [Accepted: 01/10/2021] [Indexed: 06/29/2023] Open
Abstract
The reduction in spectral resolution by cochlear implants oftentimes requires complementary visual speech cues to facilitate understanding. Despite substantial clinical characterization of auditory-only speech measures, relatively little is known about the audiovisual (AV) integrative abilities that most cochlear implant (CI) users rely on for daily speech comprehension. In this study, we tested AV integration in 63 CI users and 69 normal-hearing (NH) controls using the McGurk and sound-induced flash illusions. To our knowledge, this study is the largest to-date measuring the McGurk effect in this population and the first that tests the sound-induced flash illusion (SIFI). When presented with conflicting AV speech stimuli (i.e., the phoneme "ba" dubbed onto the viseme "ga"), we found that 55 CI users (87%) reported a fused percept of "da" or "tha" on at least one trial. After applying an error correction based on unisensory responses, we found that among those susceptible to the illusion, CI users experienced lower fusion than controls-a result that was concordant with results from the SIFI where the pairing of a single circle flashing on the screen with multiple beeps resulted in fewer illusory flashes for CI users. While illusion perception in these two tasks appears to be uncorrelated among CI users, we identified a negative correlation in the NH group. Because neither illusion appears to provide further explanation of variability in CI outcome measures, further research is needed to determine how these findings relate to CI users' speech understanding, particularly in ecological listening conditions that are naturally multisensory.
Collapse
Affiliation(s)
- Iliza M. Butera
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
| | - Ryan A. Stevenson
- Department of Psychology, University of
Western Ontario, London, ON, Canada
- Brain and Mind Institute, University of
Western Ontario, London, ON, Canada
| | - René H. Gifford
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
| | - Mark T. Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt
University Medical Center, Nashville, TN, USA
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
7
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
8
|
Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022; 84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]
Abstract
Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as "measures of audiovisual integration," the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.
Collapse
Affiliation(s)
| | - Violet A Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, Saint Louis, MO, USA
| | - Julia F Strand
- Department of Psychology, Carleton College, Northfield, MN, USA
| |
Collapse
|
9
|
Shams L, Beierholm U. Bayesian causal inference: A unifying neuroscience theory. Neurosci Biobehav Rev 2022; 137:104619. [PMID: 35331819 DOI: 10.1016/j.neubiorev.2022.104619] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 02/21/2022] [Accepted: 03/10/2022] [Indexed: 01/08/2023]
Abstract
Understanding of the brain and the principles governing neural processing requires theories that are parsimonious, can account for a diverse set of phenomena, and can make testable predictions. Here, we review the theory of Bayesian causal inference, which has been tested, refined, and extended in a variety of tasks in humans and other primates by several research groups. Bayesian causal inference is normative and has explained human behavior in a vast number of tasks including unisensory and multisensory perceptual tasks, sensorimotor, and motor tasks, and has accounted for counter-intuitive findings. The theory has made novel predictions that have been tested and confirmed empirically, and recent studies have started to map its algorithms and neural implementation in the human brain. The parsimony, the diversity of the phenomena that the theory has explained, and its illuminating brain function at all three of Marr's levels of analysis make Bayesian causal inference a strong neuroscience theory. This also highlights the importance of collaborative and multi-disciplinary research for the development of new theories in neuroscience.
Collapse
Affiliation(s)
- Ladan Shams
- Departments of Psychology, BioEngineering, and Neuroscience Interdepartmental Program, University of California, Los Angeles, USA.
| | | |
Collapse
|
10
|
Butera IM, Larson ED, DeFreese AJ, Lee AKC, Gifford RH, Wallace MT. Functional localization of audiovisual speech using near infrared spectroscopy. Brain Topogr 2022; 35:416-430. [PMID: 35821542 PMCID: PMC9334437 DOI: 10.1007/s10548-022-00904-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 05/19/2022] [Indexed: 11/21/2022]
Abstract
Visual cues are especially vital for hearing impaired individuals such as cochlear implant (CI) users to understand speech in noise. Functional Near Infrared Spectroscopy (fNIRS) is a light-based imaging technology that is ideally suited for measuring the brain activity of CI users due to its compatibility with both the ferromagnetic and electrical components of these implants. In a preliminary step toward better elucidating the behavioral and neural correlates of audiovisual (AV) speech integration in CI users, we designed a speech-in-noise task and measured the extent to which 24 normal hearing individuals could integrate the audio of spoken monosyllabic words with the corresponding visual signals of a female speaker. In our behavioral task, we found that audiovisual pairings provided average improvements of 103% and 197% over auditory-alone listening conditions in -6 and -9 dB signal-to-noise ratios consisting of multi-talker background noise. In an fNIRS task using similar stimuli, we measured activity during auditory-only listening, visual-only lipreading, and AV listening conditions. We identified cortical activity in all three conditions over regions of middle and superior temporal cortex typically associated with speech processing and audiovisual integration. In addition, three channels active during the lipreading condition showed uncorrected correlations associated with behavioral measures of audiovisual gain as well as with the McGurk effect. Further work focusing primarily on the regions of interest identified in this study could test how AV speech integration may differ for CI users who rely on this mechanism for daily communication.
Collapse
Affiliation(s)
- Iliza M. Butera
- grid.152326.10000 0001 2264 7217Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN USA
| | - Eric D. Larson
- grid.34477.330000000122986657Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA
| | - Andrea J. DeFreese
- grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA
| | - Adrian KC Lee
- grid.34477.330000000122986657Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA ,grid.34477.330000000122986657Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington USA
| | - René H. Gifford
- grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA
| | - Mark T. Wallace
- grid.152326.10000 0001 2264 7217Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN USA ,grid.152326.10000 0001 2264 7217Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN USA ,grid.412807.80000 0004 1936 9916Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN USA
| |
Collapse
|
11
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
12
|
Kitada R, Sadato N. Multisensory integration and its plasticity - How do innate and postnatal factors contribute to forming individual differences? Cortex 2021; 145:A1-A4. [PMID: 34844700 DOI: 10.1016/j.cortex.2021.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Ryo Kitada
- School of Social Sciences, Nanyang Technological University, Singapore; Graduate School of Intercultural Studies, Kobe University, Japan.
| | | |
Collapse
|
13
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
14
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|