1
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
2
|
Drew A, Soto-Faraco S. Perceptual oddities: assessing the relationship between film editing and prediction processes. Philos Trans R Soc Lond B Biol Sci 2024; 379:20220426. [PMID: 38104604 PMCID: PMC10725757 DOI: 10.1098/rstb.2022.0426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 10/16/2023] [Indexed: 12/19/2023] Open
Abstract
During film viewing, humans parse sequences of individual shots into larger narrative structures, often weaving transitions at edit points into an apparently seamless and continuous flow. Editing helps filmmakers manipulate visual transitions to induce feelings of fluency/disfluency, tension/relief, curiosity, expectation and several emotional responses. We propose that the perceptual dynamics induced by film editing can be captured by a predictive processing (PP) framework. We hypothesise that visual discontinuities at edit points produce discrepancies between anticipated and actual sensory input, leading to prediction error. Further, we propose that the magnitude of prediction error depends on the predictability of each shot within the narrative flow, and lay out an account based on conflict monitoring. We test this hypothesis in two empirical studies measuring electroencephalography (EEG) during passive viewing of film excerpts, as well as behavioural responses during an active edit detection task. We report the neural and behavioural modulations at editing boundaries across three levels of narrative depth, showing greater modulations for edits spanning less predictable, deeper narrative transitions. Overall, our contribution lays the groundwork for understanding film editing from a PP perspective. This article is part of the theme issue 'Art, aesthetics and predictive processing: theoretical and empirical perspectivess'.
Collapse
Affiliation(s)
- Alice Drew
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Carrer de Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Carrer de Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
3
|
Marly A, Yazdjian A, Soto-Faraco S. The role of conflict processing in multisensory perception: behavioural and electroencephalography evidence. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220346. [PMID: 37545310 PMCID: PMC10404919 DOI: 10.1098/rstb.2022.0346] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/04/2023] [Indexed: 08/08/2023] Open
Abstract
To form coherent multisensory perceptual representations, the brain must solve a causal inference problem: to decide if two sensory cues originated from the same event and should be combined, or if they came from different events and should be processed independently. According to current models of multisensory integration, during this process, the integrated (common cause) and segregated (different causes) internal perceptual models are entertained. In the present study, we propose that the causal inference process involves competition between these alternative perceptual models that engages the brain mechanisms of conflict processing. To test this hypothesis, we conducted two experiments, measuring reaction times (RTs) and electroencephalography, using an audiovisual ventriloquist illusion paradigm with varying degrees of intersensory disparities. Consistent with our hypotheses, incongruent trials led to slower RTs and higher fronto-medial theta power, both indicative of conflict. We also predicted that intermediate disparities would yield slower RTs and higher theta power when compared to congruent stimuli and to large disparities, owing to the steeper competition between causal models. Although this prediction was only validated in the RT study, both experiments displayed the anticipated trend. In conclusion, our findings suggest a potential involvement of the conflict mechanisms in multisensory integration of spatial information. This article is part of the theme issue 'Decision and control processes in multisensory perception'.
Collapse
Affiliation(s)
- Adrià Marly
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08005 Barcelona, Spain
| | - Arek Yazdjian
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08005 Barcelona, Spain
| | - Salvador Soto-Faraco
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain
| |
Collapse
|
4
|
Di Pietro SV, Karipidis II, Pleisch G, Brem S. Neurodevelopmental trajectories of letter and speech sound processing from preschool to the end of elementary school. Dev Cogn Neurosci 2023; 61:101255. [PMID: 37196374 DOI: 10.1016/j.dcn.2023.101255] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 03/20/2023] [Accepted: 05/11/2023] [Indexed: 05/19/2023] Open
Abstract
Learning to read alphabetic languages starts with learning letter-speech-sound associations. How this process changes brain function during development is still largely unknown. We followed 102 children with varying reading skills in a mixed-longitudinal/cross-sectional design from the prereading stage to the end of elementary school over five time points (n = 46 with two and more time points, of which n = 16 fully-longitudinal) to investigate the neural trajectories of letter and speech sound processing using fMRI. Children were presented with letters and speech sounds visually, auditorily, and audiovisually in kindergarten (6.7yo), at the middle (7.3yo) and end of first grade (7.6yo), and in second (8.4yo) and fifth grades (11.5yo). Activation of the ventral occipitotemporal cortex for visual and audiovisual processing followed a complex trajectory, with two peaks in first and fifth grades. The superior temporal gyrus (STG) showed an inverted U-shaped trajectory for audiovisual letter processing, a development that in poor readers was attenuated in middle STG and absent in posterior STG. Finally, the trajectories for letter-speech-sound integration were modulated by reading skills and showed differing directionality in the congruency effect depending on the time point. This unprecedented study captures the development of letter processing across elementary school and its neural trajectories in children with varying reading skills.
Collapse
Affiliation(s)
- S V Di Pietro
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland; URPP Adaptive Brain Circuits in Development and Learning (AdaBD), University of Zurich, Zurich, Switzerland
| | - I I Karipidis
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland
| | - G Pleisch
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland
| | - S Brem
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland; URPP Adaptive Brain Circuits in Development and Learning (AdaBD), University of Zurich, Zurich, Switzerland.
| |
Collapse
|
5
|
Zhaoping L. Peripheral and central sensation: multisensory orienting and recognition across species. Trends Cogn Sci 2023; 27:539-552. [PMID: 37095006 DOI: 10.1016/j.tics.2023.03.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 02/28/2023] [Accepted: 03/10/2023] [Indexed: 04/26/2023]
Abstract
Attentional bottlenecks force animals to deeply process only a selected fraction of sensory inputs. This motivates a unifying central-peripheral dichotomy (CPD), which separates multisensory processing into functionally defined central and peripheral senses. Peripheral senses (e.g., human audition and peripheral vision) select a fraction of the sensory inputs by orienting animals' attention; central senses (e.g., human foveal vision) allow animals to recognize the selected inputs. Originally used to understand human vision, CPD can be applied to multisensory processes across species. I first describe key characteristics of central and peripheral senses, such as the degree of top-down feedback and density of sensory receptors, and then show CPD as a framework to link ecological, behavioral, neurophysiological, and anatomical data and produce falsifiable predictions.
Collapse
Affiliation(s)
- Li Zhaoping
- University of Tübingen, Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
| |
Collapse
|
6
|
Arias Sarah P, Hall L, Saitovitch A, Aucouturier JJ, Zilbovicius M, Johansson P. Pupil dilation reflects the dynamic integration of audiovisual emotional speech. Sci Rep 2023; 13:5507. [PMID: 37016041 PMCID: PMC10073148 DOI: 10.1038/s41598-023-32133-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
Emotional speech perception is a multisensory process. When speaking with an individual we concurrently integrate the information from their voice and face to decode e.g., their feelings, moods, and emotions. However, the physiological reactions-such as the reflexive dilation of the pupil-associated to these processes remain mostly unknown. That is the aim of the current article, to investigate whether pupillary reactions can index the processes underlying the audiovisual integration of emotional signals. To investigate this question, we used an algorithm able to increase or decrease the smiles seen in a person's face or heard in their voice, while preserving the temporal synchrony between visual and auditory channels. Using this algorithm, we created congruent and incongruent audiovisual smiles, and investigated participants' gaze and pupillary reactions to manipulated stimuli. We found that pupil reactions can reflect emotional information mismatch in audiovisual speech. In our data, when participants were explicitly asked to extract emotional information from stimuli, the first fixation within emotionally mismatching areas (i.e., the mouth) triggered pupil dilation. These results reveal that pupil dilation can reflect the dynamic integration of audiovisual emotional speech and provide insights on how these reactions are triggered during stimulus perception.
Collapse
Affiliation(s)
- Pablo Arias Sarah
- Lund University Cognitive Science, Lund University, Lund, Sweden.
- STMS Lab, UMR 9912 (IRCAM/CNRS/SU), Paris, France.
- School of Neuroscience and Psychology, Glasgow University, Glasgow, UK.
| | - Lars Hall
- STMS Lab, UMR 9912 (IRCAM/CNRS/SU), Paris, France
| | - Ana Saitovitch
- U1000 Brain Imaging in Psychiatry, INSERM-CEA, Pediatric Radiology Service, Necker Enfants Malades Hospital, Paris V René Descartes University, Paris, France
| | - Jean-Julien Aucouturier
- Department of Robotics and Automation FEMTO-ST Institute (CNRS/Université de Bourgogne Franche Comté), Besançon, France
| | - Monica Zilbovicius
- U1000 Brain Imaging in Psychiatry, INSERM-CEA, Pediatric Radiology Service, Necker Enfants Malades Hospital, Paris V René Descartes University, Paris, France
| | | |
Collapse
|
7
|
Iqbal ZJ, Shahin AJ, Bortfeld H, Backer KC. The McGurk Illusion: A Default Mechanism of the Auditory System. Brain Sci 2023; 13:brainsci13030510. [PMID: 36979322 PMCID: PMC10046462 DOI: 10.3390/brainsci13030510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/10/2023] [Accepted: 03/18/2023] [Indexed: 03/22/2023] Open
Abstract
Recent studies have questioned past conclusions regarding the mechanisms of the McGurk illusion, especially how McGurk susceptibility might inform our understanding of audiovisual (AV) integration. We previously proposed that the McGurk illusion is likely attributable to a default mechanism, whereby either the visual system, auditory system, or both default to specific phonemes—those implicated in the McGurk illusion. We hypothesized that the default mechanism occurs because visual stimuli with an indiscernible place of articulation (like those traditionally used in the McGurk illusion) lead to an ambiguous perceptual environment and thus a failure in AV integration. In the current study, we tested the default hypothesis as it pertains to the auditory system. Participants performed two tasks. One task was a typical McGurk illusion task, in which individuals listened to auditory-/ba/ paired with visual-/ga/ and judged what they heard. The second task was an auditory-only task, in which individuals transcribed trisyllabic words with a phoneme replaced by silence. We found that individuals’ transcription of missing phonemes often defaulted to ‘/d/t/th/’, the same phonemes often experienced during the McGurk illusion. Importantly, individuals’ default rate was positively correlated with their McGurk rate. We conclude that the McGurk illusion arises when people fail to integrate visual percepts with auditory percepts, due to visual ambiguity, thus leading the auditory system to default to phonemes often implicated in the McGurk illusion.
Collapse
Affiliation(s)
- Zunaira J. Iqbal
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
| | - Heather Bortfeld
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Department of Psychological Sciences, University of California, Merced, CA 95353, USA
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Correspondence:
| |
Collapse
|
8
|
Scheliga S, Kellermann T, Lampert A, Rolke R, Spehr M, Habel U. Neural correlates of multisensory integration in the human brain: an ALE meta-analysis. Rev Neurosci 2023; 34:223-245. [PMID: 36084305 DOI: 10.1515/revneuro-2022-0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 07/22/2022] [Indexed: 02/07/2023]
Abstract
Previous fMRI research identified superior temporal sulcus as central integration area for audiovisual stimuli. However, less is known about a general multisensory integration network across senses. Therefore, we conducted activation likelihood estimation meta-analysis with multiple sensory modalities to identify a common brain network. We included 49 studies covering all Aristotelian senses i.e., auditory, visual, tactile, gustatory, and olfactory stimuli. Analysis revealed significant activation in bilateral superior temporal gyrus, middle temporal gyrus, thalamus, right insula, and left inferior frontal gyrus. We assume these regions to be part of a general multisensory integration network comprising different functional roles. Here, thalamus operate as first subcortical relay projecting sensory information to higher cortical integration centers in superior temporal gyrus/sulcus while conflict-processing brain regions as insula and inferior frontal gyrus facilitate integration of incongruent information. We additionally performed meta-analytic connectivity modelling and found each brain region showed co-activations within the identified multisensory integration network. Therefore, by including multiple sensory modalities in our meta-analysis the results may provide evidence for a common brain network that supports different functional roles for multisensory integration.
Collapse
Affiliation(s)
- Sebastian Scheliga
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Thilo Kellermann
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Angelika Lampert
- Institute of Physiology, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Roman Rolke
- Department of Palliative Medicine, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Marc Spehr
- Department of Chemosensation, RWTH Aachen University, Institute for Biology, Worringerweg 3, 52074 Aachen, Germany
| | - Ute Habel
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| |
Collapse
|
9
|
Lucia S, Aydin M, Bianco V, Fiorini L, Mussini E, Di Russo F. Effect of anticipatory multisensory integration on sensory-motor performance. Brain Struct Funct 2023:10.1007/s00429-023-02620-3. [PMID: 36808005 DOI: 10.1007/s00429-023-02620-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 02/10/2023] [Indexed: 02/23/2023]
Abstract
Multisensory integration (MSI) is a phenomenon that occurs in sensory areas after the presentation of multimodal stimuli. Nowadays, little is known about the anticipatory top-down processes taking place in the preparation stage of processing before the stimulus onset. Considering that the top-down modulation of modality-specific inputs might affect the MSI process, this study attempts to understand whether the direct modulation of the MSI process, beyond the well-known sensory effects, may lead to additional changes in multisensory processing also in non-sensory areas (i.e., those related to task preparation and anticipation). To this aim, event-related potentials (ERPs) were analyzed both before and after auditory and visual unisensory and multisensory stimuli during a discriminative response task (Go/No-go type). Results showed that MSI did not affect motor preparation in premotor areas, while cognitive preparation in the prefrontal cortex was increased and correlated with response accuracy. Early post-stimulus ERP activities were also affected by MSI and correlated with response time. Collectively, the present results point to the plasticity accommodating nature of the MSI processes, which are not limited to perception and extend to anticipatory cognitive preparation for task execution. Further, the enhanced cognitive control emerging during MSI is discussed in the context of Bayesian accounts of augmented predictive processing related to increased perceptual uncertainty.
Collapse
Affiliation(s)
- Stefania Lucia
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy.
| | - Merve Aydin
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy
| | - Valentina Bianco
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy
| | - Linda Fiorini
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy
- IMT School for Advanced Studies, Lucca, Italy
| | - Elena Mussini
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy
- Department of Neuroscience, Imaging and Clinical Sciences, "G. d'Annunzio" University of Chieti-Pescara, Chieti, Italy
| | - Francesco Di Russo
- Department of Movement, Human and Health Sciences, "Foro Italico" University of Rome, Rome, Italy
- IRCCS Fondazione Santa Lucia, Rome, Italy
| |
Collapse
|
10
|
Ronconi L, Vitale A, Federici A, Mazzoni N, Battaglini L, Molteni M, Casartelli L. Neural dynamics driving audio-visual integration in autism. Cereb Cortex 2023; 33:543-556. [PMID: 35266994 DOI: 10.1093/cercor/bhac083] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/04/2022] [Indexed: 02/03/2023] Open
Abstract
Audio-visual (AV) integration plays a crucial role in supporting social functions and communication in autism spectrum disorder (ASD). However, behavioral findings remain mixed and, importantly, little is known about the underlying neurophysiological bases. Studies in neurotypical adults indicate that oscillatory brain activity in different frequencies subserves AV integration, pointing to a central role of (i) individual alpha frequency (IAF), which would determine the width of the cross-modal binding window; (ii) pre-/peri-stimulus theta oscillations, which would reflect the expectation of AV co-occurrence; (iii) post-stimulus oscillatory phase reset, which would temporally align the different unisensory signals. Here, we investigate the neural correlates of AV integration in children with ASD and typically developing (TD) peers, measuring electroencephalography during resting state and in an AV integration paradigm. As for neurotypical adults, AV integration dynamics in TD children could be predicted by the IAF measured at rest and by a modulation of anticipatory theta oscillations at single-trial level. Conversely, in ASD participants, AV integration/segregation was driven exclusively by the neural processing of the auditory stimulus and the consequent auditory-induced phase reset in visual regions, suggesting that a disproportionate elaboration of the auditory input could be the main factor characterizing atypical AV integration in autism.
Collapse
Affiliation(s)
- Luca Ronconi
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy.,Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Andrea Vitale
- Theoretical and Cognitive Neuroscience Unit, Child Psychopathology Department, Scientific Institute IRCCS Eugenio Medea, 23842 Bosisio Parini, Italy
| | - Alessandra Federici
- Theoretical and Cognitive Neuroscience Unit, Child Psychopathology Department, Scientific Institute IRCCS Eugenio Medea, 23842 Bosisio Parini, Italy.,Sensory Experience Dependent (SEED) group, IMT School for Advanced Studies Lucca, 55100 Lucca, Italy
| | - Noemi Mazzoni
- Theoretical and Cognitive Neuroscience Unit, Child Psychopathology Department, Scientific Institute IRCCS Eugenio Medea, 23842 Bosisio Parini, Italy.,Laboratory for Autism and Neurodevelopmental Disorders, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, 38068 Rovereto, Italy.,Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
| | - Luca Battaglini
- Department of General Psychology, University of Padova, 35131 Padova, Italy.,Department of Physics and Astronomy "Galileo Galilei", University of Padova, 35131 Padova, Italy
| | - Massimo Molteni
- Child Psychopathology Department, Scientific Institute IRCCS Eugenio Medea, 23842 Bosisio Parini, Italy
| | - Luca Casartelli
- Theoretical and Cognitive Neuroscience Unit, Child Psychopathology Department, Scientific Institute IRCCS Eugenio Medea, 23842 Bosisio Parini, Italy
| |
Collapse
|
11
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
12
|
Erdener D, Evren Erdener Ş. Speechreading as a secondary diagnostic tool in bipolar disorder. Med Hypotheses 2022. [DOI: 10.1016/j.mehy.2021.110744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Drew A, Torralba M, Ruzzoli M, Morís Fernández L, Sabaté A, Pápai MS, Soto-Faraco S. Conflict monitoring and attentional adjustment during binocular rivalry. Eur J Neurosci 2021; 55:138-153. [PMID: 34872157 DOI: 10.1111/ejn.15554] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/26/2021] [Accepted: 11/28/2021] [Indexed: 01/09/2023]
Abstract
To make sense of ambiguous and, at times, fragmentary sensory input, the brain must rely on a process of active interpretation. At any given moment, only one of several possible perceptual representations prevails in our conscious experience. Our hypothesis is that the competition between alternative representations induces a pattern of neural activation resembling cognitive conflict, eventually leading to fluctuations between different perceptual outcomes in the case of steep competition. To test this hypothesis, we probed changes in perceptual awareness between competing images using binocular rivalry. We drew our predictions from the conflict monitoring theory, which holds that cognitive control is invoked by the detection of conflict during information processing. Our results show that fronto-medial theta oscillations (5-7 Hz), an established electroencephalography (EEG) marker of conflict, increases right before perceptual alternations and decreases thereafter, suggesting that conflict monitoring occurs during perceptual competition. Furthermore, to investigate conflict resolution via attentional engagement, we looked for a neural marker of perceptual switches as by parieto-occipital alpha oscillations (8-12 Hz). The power of parieto-occipital alpha displayed an inverse pattern to that of fronto-medial theta, reflecting periods of high interocular inhibition during stable perception, and low inhibition around moments of perceptual change. Our findings aim to elucidate the relationship between conflict monitoring mechanisms and perceptual awareness.
Collapse
Affiliation(s)
- Alice Drew
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Mireia Torralba
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Manuela Ruzzoli
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastian, Spain.,Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Luis Morís Fernández
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Departamento de Psicología Básica, Universidad Autónoma de Madrid, Madrid, Spain
| | - Alba Sabaté
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Márta Szabina Pápai
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group, Centre for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
14
|
Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020; 10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open
Abstract
In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.
Collapse
|
15
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
16
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
17
|
Soto-Faraco S. Reply to C. Spence: Multisensory Interactions in the Real World. Multisens Res 2020; 33:693-699. [PMID: 33706261 DOI: 10.1163/22134808-bja10005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 03/08/2020] [Indexed: 11/19/2022]
Affiliation(s)
- Salvador Soto-Faraco
- Center for Brain and Cognition, ICREA and Edifici Merce Rodoreda, Universitat Pompeu Fabra, Room 24.327, Carrer de Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
| |
Collapse
|
18
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
19
|
Zhang S, Xu W, Zhu Y, Tian E, Kong W. Impaired Multisensory Integration Predisposes the Elderly People to Fall: A Systematic Review. Front Neurosci 2020; 14:411. [PMID: 32410958 PMCID: PMC7198912 DOI: 10.3389/fnins.2020.00411] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 04/06/2020] [Indexed: 11/16/2022] Open
Abstract
Background: This systematic review pooled all the latest data and reviewed all the relevant studies to look into the effect of multisensory integration on the balance function in the elderly. Methods: PubMed, Web of Science and Scopus were searched to find eligible studies published prior to May 2019. The studies were limited to those published in Chinese and English language. The quality of the included studies was assessed against the Newcastle-Ottawa Scale or an 11-item checklist, as recommended by Agency for Healthcare Research and Quality (AHRQ). Any disagreement among reviewers was resolved by comparing notes and reaching a consensus. Results: Eight hundred thirty-nine records were identified and 17 of them were included for systematic review. The result supported our assumption that multisensory integration works on balance function in the elderly. All the 17 studies were believed to be of high or moderate quality. Conclusions: The systematic review found that the impairment of multisensory integration could predispose elderly people to fall. Accurate assessment of multisensory integration can help the elderly identify the impaired balance function and minimize the risk of fall. And our results provide a new basis for further understanding of balance maintenance mechanism. Further research is warranted to explore the change in brain areas related to multisensory integration in the elderly.
Collapse
Affiliation(s)
- Sulin Zhang
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.,Institute of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Wenchao Xu
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yuting Zhu
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - E Tian
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Weijia Kong
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.,Institute of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.,Key Laboratory of Neurological Disorders of Education Ministry, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
20
|
Li Y, Seger C, Chen Q, Mo L. Left Inferior Frontal Gyrus Integrates Multisensory Information in Category Learning. Cereb Cortex 2020; 30:4410-4423. [DOI: 10.1093/cercor/bhaa029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/31/2019] [Accepted: 01/22/2020] [Indexed: 12/12/2022] Open
Abstract
Abstract
Humans are able to categorize things they encounter in the world (e.g., a cat) by integrating multisensory information from the auditory and visual modalities with ease and speed. However, how the brain learns multisensory categories remains elusive. The present study used functional magnetic resonance imaging to investigate, for the first time, the neural mechanisms underpinning multisensory information-integration (II) category learning. A sensory-modality-general network, including the left insula, right inferior frontal gyrus (IFG), supplementary motor area, left precentral gyrus, bilateral parietal cortex, and right caudate and globus pallidus, was recruited for II categorization, regardless of whether the information came from a single modality or from multiple modalities. Putamen activity was higher in correct categorization than incorrect categorization. Critically, the left IFG and left body and tail of the caudate were activated in multisensory II categorization but not in unisensory II categorization, which suggests this network plays a specific role in integrating multisensory information during category learning. The present results extend our understanding of the role of the left IFG in multisensory processing from the linguistic domain to a broader role in audiovisual learning.
Collapse
Affiliation(s)
- You Li
- School of Psychology and Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, Guangdong, China
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Carol Seger
- School of Psychology and Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, Guangdong, China
- Department of Psychology, Colorado State University, Fort Collins, CO 80521 USA
| | - Qi Chen
- School of Psychology and Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, Guangdong, China
| | - Lei Mo
- School of Psychology and Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, Guangdong, China
| |
Collapse
|
21
|
Spagna A, Kim TH, Wu T, Fan J. Right hemisphere superiority for executive control of attention. Cortex 2020; 122:263-276. [DOI: 10.1016/j.cortex.2018.12.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 10/01/2018] [Accepted: 12/11/2018] [Indexed: 11/25/2022]
|
22
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
23
|
"Paying" attention to audiovisual speech: Do incongruent stimuli incur greater costs? Atten Percept Psychophys 2019; 81:1743-1756. [PMID: 31197661 DOI: 10.3758/s13414-019-01772-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing - susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.
Collapse
|
24
|
Abstract
At any given moment, we receive input through our different sensory systems, and this information needs to be processed and integrated. Multisensory processing requires the coordinated activity of distinct cortical areas. Key mechanisms implicated in these processes include local neural oscillations and functional connectivity between distant cortical areas. Evidence is now emerging that neural oscillations in distinct frequency bands reflect different mechanisms of multisensory processing. Moreover, studies suggest that aberrant neural oscillations contribute to multisensory processing deficits in clinical populations, such as schizophrenia. In this article, we review recent literature on the neural mechanisms underlying multisensory processing, focusing on neural oscillations. We derive a framework that summarizes findings on (1) stimulus-driven multisensory processing, (2) the influence of top-down information on multisensory processing, and (3) the role of predictions for the formation of multisensory perception. We propose that different frequency band oscillations subserve complementary mechanisms of multisensory processing. These processes can act in parallel and are essential for multisensory processing.
Collapse
Affiliation(s)
- Julian Keil
- 1 Biological Psychology, Christian-Albrechts-University Kiel, Kiel, Germany
- 2 Department of Psychiatry and Psychotherapy, St. Hedwig Hospital, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Daniel Senkowski
- 2 Department of Psychiatry and Psychotherapy, St. Hedwig Hospital, Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
25
|
Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, Strand JF. What accounts for individual differences in susceptibility to the McGurk effect? PLoS One 2018; 13:e0207160. [PMID: 30418995 PMCID: PMC6231656 DOI: 10.1371/journal.pone.0207160] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/25/2018] [Indexed: 11/29/2022] Open
Abstract
The McGurk effect is a classic audiovisual speech illusion in which discrepant auditory and visual syllables can lead to a fused percept (e.g., an auditory /bɑ/ paired with a visual /gɑ/ often leads to the perception of /dɑ/). The McGurk effect is robust and easily replicated in pooled group data, but there is tremendous variability in the extent to which individual participants are susceptible to it. In some studies, the rate at which individuals report fusion responses ranges from 0% to 100%. Despite its widespread use in the audiovisual speech perception literature, the roots of the wide variability in McGurk susceptibility are largely unknown. This study evaluated whether several perceptual and cognitive traits are related to McGurk susceptibility through correlational analyses and mixed effects modeling. We found that an individual's susceptibility to the McGurk effect was related to their ability to extract place of articulation information from the visual signal (i.e., a more fine-grained analysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency. These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill. In contrast, cognitive and perceptual abilities that are commonly used predictors in individual differences studies do not appear to underlie susceptibility to the McGurk effect.
Collapse
Affiliation(s)
- Violet A. Brown
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Maryam Hedayati
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Annie Zanger
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Sasha Mayn
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Lucia Ray
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Naseem Dillman-Hasso
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Julia F. Strand
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| |
Collapse
|
26
|
Davies-Thompson J, Elli GV, Rezk M, Benetti S, van Ackeren M, Collignon O. Hierarchical Brain Network for Face and Voice Integration of Emotion Expression. Cereb Cortex 2018; 29:3590-3605. [DOI: 10.1093/cercor/bhy240] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 08/29/2018] [Indexed: 12/22/2022] Open
Abstract
Abstract
The brain has separate specialized computational units to process faces and voices located in occipital and temporal cortices. However, humans seamlessly integrate signals from the faces and voices of others for optimal social interaction. How are emotional expressions, when delivered by different sensory modalities (faces and voices), integrated in the brain? In this study, we characterized the brains’ response to faces, voices, and combined face–voice information (congruent, incongruent), which varied in expression (neutral, fearful). Using a whole-brain approach, we found that only the right posterior superior temporal sulcus (rpSTS) responded more to bimodal stimuli than to face or voice alone but only when the stimuli contained emotional expression. Face- and voice-selective regions of interest, extracted from independent functional localizers, similarly revealed multisensory integration in the face-selective rpSTS only; further, this was the only face-selective region that also responded significantly to voices. Dynamic causal modeling revealed that the rpSTS receives unidirectional information from the face-selective fusiform face area, and voice-selective temporal voice area, with emotional expression affecting the connection strength. Our study promotes a hierarchical model of face and voice integration, with convergence in the rpSTS, and that such integration depends on the (emotional) salience of the stimuli.
Collapse
Affiliation(s)
- Jodie Davies-Thompson
- Crossmodal Perception and Plasticity Laboratory, Center of Mind/Brain Sciences, University of Trento, Mattarello 38123 - TN, via delle Regole, Italy
- Face Research, Swansea (FaReS), Department of Psychology, College of Human and Health Sciences, Swansea University, Singleton Park, Swansea, UK
| | - Giulia V Elli
- Department of Psychological & Brain Sciences, John Hopkins University, Baltimore, MD, USA
| | - Mohamed Rezk
- Crossmodal Perception and Plasticity Laboratory, Center of Mind/Brain Sciences, University of Trento, Mattarello 38123 - TN, via delle Regole, Italy
- Institute of research in Psychology (IPSY), Institute of Neuroscience (IoNS), 10 Place du Cardinal Mercier, 1348 Louvain-La-Neuve, University of Louvain (UcL), Belgium
| | - Stefania Benetti
- Crossmodal Perception and Plasticity Laboratory, Center of Mind/Brain Sciences, University of Trento, Mattarello 38123 - TN, via delle Regole, Italy
| | - Markus van Ackeren
- Crossmodal Perception and Plasticity Laboratory, Center of Mind/Brain Sciences, University of Trento, Mattarello 38123 - TN, via delle Regole, Italy
| | - Olivier Collignon
- Crossmodal Perception and Plasticity Laboratory, Center of Mind/Brain Sciences, University of Trento, Mattarello 38123 - TN, via delle Regole, Italy
- Institute of research in Psychology (IPSY), Institute of Neuroscience (IoNS), 10 Place du Cardinal Mercier, 1348 Louvain-La-Neuve, University of Louvain (UcL), Belgium
| |
Collapse
|
27
|
Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018; 385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]
Abstract
In this study the timing of electromagnetic signals recorded during incongruent and congruent audiovisual (AV) stimulation in 14 Italian healthy volunteers was examined. In a previous study (Proverbio et al., 2016) we investigated the McGurk effect in the Italian language and found out which visual and auditory inputs provided the most compelling illusory effects (e.g., bilabial phonemes presented acoustically and paired with non-labials, especially alveolar-nasal and velar-occlusive phonemes). In this study EEG was recorded from 128 scalp sites while participants observed a female and a male actor uttering 288 syllables selected on the basis of the previous investigation (lasting approximately 600 ms) and responded to rare targets (/re/, /ri/, /ro/, /ru/). In half of the cases the AV information was incongruent, except for targets that were always congruent. A pMMN (phonological Mismatch Negativity) to incongruent AV stimuli was identified 500 ms after voice onset time. This automatic response indexed the detection of an incongruity between the labial and phonetic information. SwLORETA (Low-Resolution Electromagnetic Tomography) analysis applied to the difference voltage incongruent-congruent in the same time window revealed that the strongest sources of this activity were the right superior temporal (STG) and superior frontal gyri, which supports their involvement in AV integration.
Collapse
Affiliation(s)
- Alice Mado Proverbio
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy.
| | - Giulia Raso
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy
| | | |
Collapse
|
28
|
Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018; 48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Mireia Torralba
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|