1
|
Bernal-Berdun E, Vallejo M, Sun Q, Serrano A, Gutierrez D. Modeling the Impact of Head-Body Rotations on Audio-Visual Spatial Perception for Virtual Reality Applications. IEEE Trans Vis Comput Graph 2024; 30:2624-2632. [PMID: 38446650 DOI: 10.1109/tvcg.2024.3372112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Humans perceive the world by integrating multimodal sensory feedback, including visual and auditory stimuli, which holds true in virtual reality (VR) environments. Proper synchronization of these stimuli is crucial for perceiving a coherent and immersive VR experience. In this work, we focus on the interplay between audio and vision during localization tasks involving natural head-body rotations. We explore the impact of audio-visual offsets and rotation velocities on users' directional localization acuity for various viewing modes. Using psychometric functions, we model perceptual disparities between visual and auditory cues and determine offset detection thresholds. Our findings reveal that target localization accuracy is affected by perceptual audio-visual disparities during head-body rotations, but remains consistent in the absence of stimuli-head relative motion. We then showcase the effectiveness of our approach in predicting and enhancing users' localization accuracy within realistic VR gaming applications. To provide additional support for our findings, we implement a natural VR game wherein we apply a compensatory audio-visual offset derived from our measured psychometric functions. As a result, we demonstrate a substantial improvement of up to 40% in participants' target localization accuracy. We additionally provide guidelines for content creation to ensure coherent and seamless VR experiences.
Collapse
|
2
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. Front Hum Neurosci 2023; 17:1283206. [PMID: 38162285 PMCID: PMC10754997 DOI: 10.3389/fnhum.2023.1283206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
| | | | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
3
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
4
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. bioRxiv 2023:2023.08.23.554451. [PMID: 37662393 PMCID: PMC10473711 DOI: 10.1101/2023.08.23.554451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
5
|
Pesnot Lerousseau J, Parise CV, Ernst MO, van Wassenhove V. Multisensory correlation computations in the human brain identified by a time-resolved encoding model. Nat Commun 2022; 13:2489. [PMID: 35513362 PMCID: PMC9072402 DOI: 10.1038/s41467-022-29687-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/14/2022] [Indexed: 11/09/2022] Open
Abstract
Neural mechanisms that arbitrate between integrating and segregating multisensory information are essential for complex scene analysis and for the resolution of the multisensory correspondence problem. However, these mechanisms and their dynamics remain largely unknown, partly because classical models of multisensory integration are static. Here, we used the Multisensory Correlation Detector, a model that provides a good explanatory power for human behavior while incorporating dynamic computations. Participants judged whether sequences of auditory and visual signals originated from the same source (causal inference) or whether one modality was leading the other (temporal order), while being recorded with magnetoencephalography. First, we confirm that the Multisensory Correlation Detector explains causal inference and temporal order behavioral judgments well. Second, we found strong fits of brain activity to the two outputs of the Multisensory Correlation Detector in temporo-parietal cortices. Finally, we report an asymmetry in the goodness of the fits, which were more reliable during the causal inference task than during the temporal order judgment task. Overall, our results suggest the existence of multisensory correlation detectors in the human brain, which explain why and how causal inference is strongly driven by the temporal correlation of multisensory signals.
Collapse
Affiliation(s)
- Jacques Pesnot Lerousseau
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France. .,Applied Cognitive Psychology, Ulm University, Ulm, Germany. .,Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, CNRS, Université Paris-Saclay, NeuroSpin, 91191, Gif/Yvette, France.
| | | | - Marc O Ernst
- Applied Cognitive Psychology, Ulm University, Ulm, Germany
| | - Virginie van Wassenhove
- Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, CNRS, Université Paris-Saclay, NeuroSpin, 91191, Gif/Yvette, France
| |
Collapse
|
6
|
Denfeld QE, Lee CS, Habecker BA. A Primer on Incorporating Sex as a Biological Variable into the Conduct and Reporting of Basic and Clinical Research Studies. Am J Physiol Heart Circ Physiol 2022; 322:H350-H354. [PMID: 35030071 DOI: 10.1152/ajpheart.00605.2021] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The recent move to require sex as a biological variable (SABV), which includes gender, into the reporting of research published by the American Journal of Physiology - Heart and Circulatory Physiology follows a growing, and much-needed, trend by journals. Understandably, there is concern over how to do this without adding considerable work, especially if one's primary research focus is not on elucidating sex/gender differences. The purpose of this article is to provide additional guidance and examples on how to incorporate SABV into the conduct and reporting of basic and clinical research. Using examples from our research, which includes both studies focused and not focused on sex/gender differences, we offer suggestions for how to incorporate SABV into basic and clinical research studies.
Collapse
Affiliation(s)
- Quin E Denfeld
- Oregon Health & Science University School of Nursing, Portland, OR, United States.,Oregon Health & Science University Knight Cardiovascular Institute, Portland, OR, United States
| | - Christopher S Lee
- William F. Connell School of Nursing, Boston College, Chestnut Hill, MA, United States.,Australian Catholic University, Melbourne, Victoria, Australia
| | - Beth A Habecker
- Oregon Health & Science University Knight Cardiovascular Institute, Portland, OR, United States.,Oregon Health & Science University Department of Chemical Physiology and Biochemistry, Portland, OR, United States
| |
Collapse
|
7
|
Chow HM, Leviyah X, Ciaramitaro VM. Individual Differences in Multisensory Interactions:The Influence of Temporal Phase Coherence and Auditory Salience on Visual Contrast Sensitivity. Vision (Basel) 2020; 4:E12. [PMID: 32033350 DOI: 10.3390/vision4010012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/21/2020] [Accepted: 01/30/2020] [Indexed: 11/16/2022] Open
Abstract
While previous research has investigated key factors contributing to multisensory integration in isolation, relatively little is known regarding how these factors interact, especially when considering the enhancement of visual contrast sensitivity by a task-irrelevant sound. Here we explored how auditory stimulus properties, namely salience and temporal phase coherence in relation to the visual target, jointly affect the extent to which a sound can enhance visual contrast sensitivity. Visual contrast sensitivity was measured by a psychophysical task, where human adult participants reported the location of a visual Gabor pattern presented at various contrast levels. We expected the most enhanced contrast sensitivity, the lowest contrast threshold, when the visual stimulus was accompanied by a task-irrelevant sound, weak in auditory salience, modulated in-phase with the visual stimulus (strong temporal phase coherence). Our expectations were confirmed, but only if we accounted for individual differences in optimal auditory salience level to induce maximal multisensory enhancement effects. Our findings highlight the importance of interactions between temporal phase coherence and stimulus effectiveness in determining the strength of multisensory enhancement of visual contrast as well as highlighting the importance of accounting for individual differences.
Collapse
|