1
|
Wang X, Tang X, Wang A, Zhang M. Non-spatial inhibition of return attenuates audiovisual integration owing to modality disparities. Atten Percept Psychophys 2023:10.3758/s13414-023-02825-y. [PMID: 38127253 DOI: 10.3758/s13414-023-02825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2023] [Indexed: 12/23/2023]
Abstract
Although previous studies have investigated the relationship between inhibition of return (IOR) and multisensory integration, the influence of non-spatial has not been explored. The present study aimed to investigate the influence of non-spatial IOR on audiovisual integration by using a "prime-neutral cue-target" paradigm. In Experiment 1, which manipulated prime validity and target modality, the targets were positioned centrally, revealing significant non-spatial IOR effects in the visual, auditory, and audiovisual modalities. Analysis of relative multisensory response enhancement (rMRE) indicated substantial audiovisual integration enhancement in both valid and invalid target conditions. Furthermore, the enhancement was weaker for valid targets than for invalid targets. In Experiment 2, the targets were positioned above and below to rule out repetition blindness (RB); this experiment successfully replicated the results observed in Experiment 1. Notably, Experiments 1 and 2 consistently found that the correlation between modality differences and rMRE for valid targets indicated that differences in signal strength between visual and auditory modalities contributed to a reduction in audiovisual integration. However, the absence of correlation with the invalid target suggests that attention, as a key factor, may play a significant role in this process. The present study highlights how non-spatial IOR reduces audiovisual integration and sheds light on the complex interaction between attention and multisensory integration.
Collapse
Affiliation(s)
- Xiaoxue Wang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, People's Republic of China
| | - Xiaoyu Tang
- School of Psychology, Liaoning Normal University, Dalian, China
| | - Aijun Wang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, People's Republic of China.
| | - Ming Zhang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, People's Republic of China.
- Department of Psychology, Suzhou University of Science and Technology, Suzhou, China.
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama, Japan.
| |
Collapse
|
2
|
Hansmann D, Derrick D, Theys C. Hearing, seeing, and feeling speech: the neurophysiological correlates of trimodal speech perception. Front Hum Neurosci 2023; 17:1225976. [PMID: 37706173 PMCID: PMC10495990 DOI: 10.3389/fnhum.2023.1225976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 08/08/2023] [Indexed: 09/15/2023] Open
Abstract
Introduction To perceive speech, our brains process information from different sensory modalities. Previous electroencephalography (EEG) research has established that audio-visual information provides an advantage compared to auditory-only information during early auditory processing. In addition, behavioral research showed that auditory speech perception is not only enhanced by visual information but also by tactile information, transmitted by puffs of air arriving at the skin and aligned with speech. The current EEG study aimed to investigate whether the behavioral benefits of bimodal audio-aerotactile and trimodal audio-visual-aerotactile speech presentation are reflected in cortical auditory event-related neurophysiological responses. Methods To examine the influence of multimodal information on speech perception, 20 listeners conducted a two-alternative forced-choice syllable identification task at three different signal-to-noise levels. Results Behavioral results showed increased syllable identification accuracy when auditory information was complemented with visual information, but did not show the same effect for the addition of tactile information. Similarly, EEG results showed an amplitude suppression for the auditory N1 and P2 event-related potentials for the audio-visual and audio-visual-aerotactile modalities compared to auditory and audio-aerotactile presentations of the syllable/pa/. No statistically significant difference was present between audio-aerotactile and auditory-only modalities. Discussion Current findings are consistent with past EEG research showing a visually induced amplitude suppression during early auditory processing. In addition, the significant neurophysiological effect of audio-visual but not audio-aerotactile presentation is in line with the large benefit of visual information but comparatively much smaller effect of aerotactile information on auditory speech perception previously identified in behavioral research.
Collapse
Affiliation(s)
- Doreen Hansmann
- School of Psychology, Speech and Hearing, University of Canterbury, Christchurch, New Zealand
| | - Donald Derrick
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Catherine Theys
- School of Psychology, Speech and Hearing, University of Canterbury, Christchurch, New Zealand
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
3
|
Dunham-Carr K, Feldman JI, Simon DM, Edmunds SR, Tu A, Kuang W, Conrad JG, Santapuram P, Wallace MT, Woynaroski TG. The Processing of Audiovisual Speech Is Linked with Vocabulary in Autistic and Nonautistic Children: An ERP Study. Brain Sci 2023; 13:1043. [PMID: 37508976 PMCID: PMC10377472 DOI: 10.3390/brainsci13071043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 06/29/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
Explaining individual differences in vocabulary in autism is critical, as understanding and using words to communicate are key predictors of long-term outcomes for autistic individuals. Differences in audiovisual speech processing may explain variability in vocabulary in autism. The efficiency of audiovisual speech processing can be indexed via amplitude suppression, wherein the amplitude of the event-related potential (ERP) is reduced at the P2 component in response to audiovisual speech compared to auditory-only speech. This study used electroencephalography (EEG) to measure P2 amplitudes in response to auditory-only and audiovisual speech and norm-referenced, standardized assessments to measure vocabulary in 25 autistic and 25 nonautistic children to determine whether amplitude suppression (a) differs or (b) explains variability in vocabulary in autistic and nonautistic children. A series of regression analyses evaluated associations between amplitude suppression and vocabulary scores. Both groups demonstrated P2 amplitude suppression, on average, in response to audiovisual speech relative to auditory-only speech. Between-group differences in mean amplitude suppression were nonsignificant. Individual differences in amplitude suppression were positively associated with expressive vocabulary through receptive vocabulary, as evidenced by a significant indirect effect observed across groups. The results suggest that efficiency of audiovisual speech processing may explain variance in vocabulary in autism.
Collapse
Affiliation(s)
- Kacie Dunham-Carr
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN 37232, USA
| | - Jacob I Feldman
- Frist Center for Autism and Innovation, Vanderbilt University, Nashville, TN 37232, USA
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - David M Simon
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Sarah R Edmunds
- Department of Psychology, University of Washington, Seattle, WA 98195, USA
- Department of Psychology, University of South Carolina, Columbia, SC 29208, USA
- Department of Educational Studies, University of South Carolina, Columbia, SC 29208, USA
| | - Alexander Tu
- Neuroscience Undergraduate Program, Vanderbilt University, Nashville, TN 37232, USA
- Department of Otolaryngology and Communication Sciences, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Wayne Kuang
- Neuroscience Undergraduate Program, Vanderbilt University, Nashville, TN 37232, USA
- Department of Pediatrics, Los Angeles General Medical Center, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| | - Julie G Conrad
- Neuroscience Undergraduate Program, Vanderbilt University, Nashville, TN 37232, USA
- College of Medicine, University of Illinois Hospital, Chicago, IL 60612, USA
| | - Pooja Santapuram
- Neuroscience Undergraduate Program, Vanderbilt University, Nashville, TN 37232, USA
- Department of Anesthesiology, Columbia University Irving Medical Center, New York City, NY 10032, USA
| | - Mark T Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN 37232, USA
- Frist Center for Autism and Innovation, Vanderbilt University, Nashville, TN 37232, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Psychology, Vanderbilt University, Nashville, TN 37232, USA
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Tiffany G Woynaroski
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA
- Frist Center for Autism and Innovation, Vanderbilt University, Nashville, TN 37232, USA
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Communication Sciences and Disorders, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI 96813, USA
| |
Collapse
|
4
|
Caron CJ, Vilain C, Schwartz JL, Bayard C, Calcus A, Leybaert J, Colin C. The Effect of Cued-Speech (CS) Perception on Auditory Processing in Typically Hearing (TH) Individuals Who Are Either Naïve or Experienced CS Producers. Brain Sci 2023; 13:1036. [PMID: 37508968 PMCID: PMC10377728 DOI: 10.3390/brainsci13071036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/28/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
Cued Speech (CS) is a communication system that uses manual gestures to facilitate lipreading. In this study, we investigated how CS information interacts with natural speech using Event-Related Potential (ERP) analyses in French-speaking, typically hearing adults (TH) who were either naïve or experienced CS producers. The audiovisual (AV) presentation of lipreading information elicited an amplitude attenuation of the entire N1 and P2 complex in both groups, accompanied by N1 latency facilitation in the group of CS producers. Adding CS gestures to lipread information increased the magnitude of effects observed at the N1 time window, but did not enhance P2 amplitude attenuation. Interestingly, presenting CS gestures without lipreading information yielded distinct response patterns depending on participants' experience with the system. In the group of CS producers, AV perception of CS gestures facilitated the early stage of speech processing, while in the group of naïve participants, it elicited a latency delay at the P2 time window. These results suggest that, for experienced CS users, the perception of gestures facilitates early stages of speech processing, but when people are not familiar with the system, the perception of gestures impacts the efficiency of phonological decoding.
Collapse
Affiliation(s)
- Cora Jirschik Caron
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Coriandre Vilain
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Jean-Luc Schwartz
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Clémence Bayard
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Axelle Calcus
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Jacqueline Leybaert
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Cécile Colin
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| |
Collapse
|
5
|
Guo A, Yang W, Yang X, Lin J, Li Z, Ren Y, Yang J, Wu J. Audiovisual n-Back Training Alters the Neural Processes of Working Memory and Audiovisual Integration: Evidence of Changes in ERPs. Brain Sci 2023; 13:992. [PMID: 37508924 PMCID: PMC10377064 DOI: 10.3390/brainsci13070992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/15/2023] [Accepted: 05/16/2023] [Indexed: 07/30/2023] Open
Abstract
(1) Background: This study investigates whether audiovisual n-back training leads to training effects on working memory and transfer effects on perceptual processing. (2) Methods: Before and after training, the participants were tested using the audiovisual n-back task (1-, 2-, or 3-back), to detect training effects, and the audiovisual discrimination task, to detect transfer effects. (3) Results: For the training effect, the behavioral results show that training leads to greater accuracy and faster response times. Stronger training gains in accuracy and response time using 3- and 2-back tasks, compared to 1-back, were observed in the training group. Event-related potentials (ERPs) data revealed an enhancement of P300 in the frontal and central regions across all working memory levels after training. Training also led to the enhancement of N200 in the central region in the 3-back condition. For the transfer effect, greater audiovisual integration in the frontal and central regions during the post-test rather than pre-test was observed at an early stage (80-120 ms) in the training group. (4) Conclusion: Our findings provide evidence that audiovisual n-back training enhances neural processes underlying a working memory and demonstrate a positive influence of higher cognitive functions on lower cognitive functions.
Collapse
Affiliation(s)
- Ao Guo
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama 700-8530, Japan
| | - Weiping Yang
- Department of Psychology, Faculty of Education, Hubei University, Wuhan 430062, China
- Brain and Cognition Research Center (BCRC), Faculty of Education, Hubei University, Wuhan 430062, China
| | - Xiangfu Yang
- Department of Psychology, Faculty of Education, Hubei University, Wuhan 430062, China
| | - Jinfei Lin
- Department of Psychology, Faculty of Education, Hubei University, Wuhan 430062, China
| | - Zimo Li
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama 700-8530, Japan
| | - Yanna Ren
- Department of Psychology, College of Humanities and Management, Guizhou University of Traditional Chinese Medicine, Guiyang 550003, China
| | - Jiajia Yang
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama 700-8530, Japan
- Applied Brain Science Lab., Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama 700-8530, Japan
| | - Jinglong Wu
- School of Medical Technology, Beijing Institute of Technology, Beijing 100811, China
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama 700-8530, Japan
| |
Collapse
|
6
|
Ghaneirad E, Saenger E, Szycik GR, Čuš A, Möde L, Sinke C, Wiswede D, Bleich S, Borgolte A. Deficient Audiovisual Speech Perception in Schizophrenia: An ERP Study. Brain Sci 2023; 13:970. [PMID: 37371448 DOI: 10.3390/brainsci13060970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/15/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
In everyday verbal communication, auditory speech perception is often disturbed by background noise. Especially in disadvantageous hearing conditions, additional visual articulatory information (e.g., lip movement) can positively contribute to speech comprehension. Patients with schizophrenia (SZs) demonstrate an aberrant ability to integrate visual and auditory sensory input during speech perception. Current findings about underlying neural mechanisms of this deficit are inconsistent. Particularly and despite the importance of early sensory processing in speech perception, very few studies have addressed these processes in SZs. Thus, in the present study, we examined 20 adult subjects with SZ and 21 healthy controls (HCs) while presenting audiovisual spoken words (disyllabic nouns) either superimposed by white noise (-12 dB signal-to-noise ratio) or not. In addition to behavioral data, event-related brain potentials (ERPs) were recorded. Our results demonstrate reduced speech comprehension for SZs compared to HCs under noisy conditions. Moreover, we found altered N1 amplitudes in SZ during speech perception, while P2 amplitudes and the N1-P2 complex were similar to HCs, indicating that there may be disturbances in multimodal speech perception at an early stage of processing, which may be due to deficits in auditory speech perception. Moreover, a positive relationship between fronto-central N1 amplitudes and the positive subscale of the Positive and Negative Syndrome Scale (PANSS) has been observed.
Collapse
Affiliation(s)
- Erfan Ghaneirad
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Ellyn Saenger
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Gregor R Szycik
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Anja Čuš
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Laura Möde
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Christopher Sinke
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| | - Daniel Wiswede
- Department of Neurology, University of Lübeck, 23562 Lübeck, Germany
| | - Stefan Bleich
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
- Center for Systems Neuroscience, University of Veterinary Medicine, 30559 Hanover, Germany
| | - Anna Borgolte
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, 30635 Hanover, Germany
| |
Collapse
|
7
|
Chalas N, Omigie D, Poeppel D, van Wassenhove V. Hierarchically nested networks optimize the analysis of audiovisual speech. iScience 2023; 26:106257. [PMID: 36909667 PMCID: PMC9993032 DOI: 10.1016/j.isci.2023.106257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/22/2022] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
In conversational settings, seeing the speaker's face elicits internal predictions about the upcoming acoustic utterance. Understanding how the listener's cortical dynamics tune to the temporal statistics of audiovisual (AV) speech is thus essential. Using magnetoencephalography, we explored how large-scale frequency-specific dynamics of human brain activity adapt to AV speech delays. First, we show that the amplitude of phase-locked responses parametrically decreases with natural AV speech synchrony, a pattern that is consistent with predictive coding. Second, we show that the temporal statistics of AV speech affect large-scale oscillatory networks at multiple spatial and temporal resolutions. We demonstrate a spatial nestedness of oscillatory networks during the processing of AV speech: these oscillatory hierarchies are such that high-frequency activity (beta, gamma) is contingent on the phase response of low-frequency (delta, theta) networks. Our findings suggest that the endogenous temporal multiplexing of speech processing confers adaptability within the temporal regimes that are essential for speech comprehension.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, P.C., 48149 Münster, Germany
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- School of Biology, Faculty of Sciences, Aristotle University of Thessaloniki, P.C., 54124 Thessaloniki, Greece
- Corresponding author
| | - Diana Omigie
- Department of Psychology, Goldsmiths University London, London, UK
| | - David Poeppel
- Department of Psychology, New York University, New York, NY 10003, USA
- Ernst Struengmann Institute for Neuroscience, 60528 Frankfurt am Main, Frankfurt, Germany
| | - Virginie van Wassenhove
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- Corresponding author
| |
Collapse
|
8
|
Hong S, Wang R, Zeng B. Incongruent visual cues affect the perception of Mandarin vowel but not tone. Front Psychol 2023; 13:971979. [PMID: 36687891 PMCID: PMC9846355 DOI: 10.3389/fpsyg.2022.971979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/01/2022] [Indexed: 01/06/2023] Open
Abstract
Over the recent few decades, a large number of audiovisual speech studies have been focusing on the visual cues of consonants and vowels but neglecting those relating to lexical tones. In this study, we investigate whether incongruent audiovisual information interfered with the perception of lexical tones. We found that, for both Chinese and English speakers, incongruence between auditory and visemic mouth shape (i.e., visual form information) significantly interfered with reaction time and reduced the identification accuracy of vowels. However, incongruent lip movements (i.e., visual timing information) did not interfere with the perception of auditory lexical tone. We conclude that, in contrast to vowel perception, auditory tone perception seems relatively impervious to visual congruence cues, at least under these restricted laboratory conditions. The salience of visual form and timing information is discussed based on this finding.
Collapse
Affiliation(s)
- Shanhu Hong
- Institute of Foreign Language and Tourism, Quanzhou Preschool Education College, Quanzhou, China,Department of Psychology, Bournemouth University, Poole, United Kingdom
| | - Rui Wang
- School of Foreign Languages, Guangdong Pharmaceutical University, Guangzhou, China
| | - Biao Zeng
- Department of Psychology, Bournemouth University, Poole, United Kingdom,EEG Lab, Department of Psychology, University of South Wales, Newport, United Kingdom,*Correspondence: Biao Zeng ✉
| |
Collapse
|
9
|
Pourhashemi F, Baart M, van Laarhoven T, Vroomen J. Want to quickly adapt to distorted speech and become a better listener? Read lips, not text. PLoS One 2022; 17:e0278986. [PMID: 36580461 PMCID: PMC9799298 DOI: 10.1371/journal.pone.0278986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/28/2022] [Indexed: 12/30/2022] Open
Abstract
When listening to distorted speech, does one become a better listener by looking at the face of the speaker or by reading subtitles that are presented along with the speech signal? We examined this question in two experiments in which we presented participants with spectrally distorted speech (4-channel noise-vocoded speech). During short training sessions, listeners received auditorily distorted words or pseudowords that were partially disambiguated by concurrently presented lipread information or text. After each training session, listeners were tested with new degraded auditory words. Learning effects (based on proportions of correctly identified words) were stronger if listeners had trained with words rather than with pseudowords (a lexical boost), and adding lipread information during training was more effective than adding text (a lipread boost). Moreover, the advantage of lipread speech over text training was also found when participants were tested more than a month later. The current results thus suggest that lipread speech may have surprisingly long-lasting effects on adaptation to distorted speech.
Collapse
Affiliation(s)
- Faezeh Pourhashemi
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Martijn Baart
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
- BCBL, Basque Center on Cognition, Brain, and Language, Donostia, Spain
- * E-mail:
| | - Thijs van Laarhoven
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
10
|
Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022; 426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]
|
11
|
Begau A, Klatt LI, Schneider D, Wascher E, Getzmann S. The role of informational content of visual speech in an audiovisual cocktail party: Evidence from cortical oscillations in young and old participants. Eur J Neurosci 2022; 56:5215-5234. [PMID: 36017762 DOI: 10.1111/ejn.15811] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/01/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]
Abstract
Age-related differences in the processing of audiovisual speech in a multi-talker environment were investigated analysing event-related spectral perturbations (ERSPs), focusing on theta, alpha and beta oscillations that are assumed to reflect conflict processing, multisensory integration and attentional mechanisms, respectively. Eighteen older and 21 younger healthy adults completed a two-alternative forced-choice word discrimination task, responding to audiovisual speech stimuli. In a cocktail-party scenario with two competing talkers (located at -15° and 15° azimuth), target words (/yes/or/no/) appeared at a pre-defined (attended) position, distractor words at the other position. In two audiovisual conditions, acoustic speech was combined either with informative or uninformative visual speech. While a behavioural benefit for informative visual speech occurred for both age groups, differences between audiovisual conditions in the theta and beta band were only present for older adults. A stronger increase in theta perturbations for stimuli containing uninformative visual speech could be associated with early conflict processing, while a stronger suppression in beta perturbations for informative visual speech could be associated to audiovisual integration. Compared to the younger group, the older group showed generally stronger beta perturbations. No condition differences in the alpha band were found. Overall, the findings suggest age-related differences in audiovisual speech integration in a multi-talker environment. While the behavioural benefit of informative visual speech was unaffected by age, older adults had a stronger need for cognitive control when processing conflicting audiovisual speech input. Furthermore, mechanisms of audiovisual integration are differently activated depending on the informational content of the visual information.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| |
Collapse
|
12
|
Sato M. Motor and visual influences on auditory neural processing during speaking and listening. Cortex 2022; 152:21-35. [DOI: 10.1016/j.cortex.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/02/2022] [Accepted: 03/15/2022] [Indexed: 11/03/2022]
|
13
|
Senkowski D, Moran JK. Early evoked brain activity underlies auditory and audiovisual speech recognition deficits in schizophrenia. Neuroimage Clin 2022; 33:102909. [PMID: 34915330 PMCID: PMC8683777 DOI: 10.1016/j.nicl.2021.102909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 11/04/2022]
Abstract
Reduced N1 amplitudes reflect speech processing deficits in schizophrenia (SZ). Crossmodal N1 amplitude suppression in audiovisual speech is preserved in SZ. N1 amplitudes correlate with speech recognition performance in controls but not in SZ.
Objectives People with Schizophrenia (SZ) show deficits in auditory and audiovisual speech recognition. It is possible that these deficits are related to aberrant early sensory processing, combined with an impaired ability to utilize visual cues to improve speech recognition. In this electroencephalography study we tested this by having SZ and healthy controls (HC) identify different unisensory auditory and bisensory audiovisual syllables at different auditory noise levels. Methods SZ (N = 24) and HC (N = 21) identified one of three different syllables (/da/, /ga/, /ta/) at three different noise levels (no, low, high). Half the trials were unisensory auditory and the other half provided additional visual input of moving lips. Task-evoked mediofrontal N1 and P2 brain potentials triggered to the onset of the auditory syllables were derived and related to behavioral performance. Results In comparison to HC, SZ showed speech recognition deficits for unisensory and bisensory stimuli. These deficits were primarily found in the no noise condition. Paralleling these observations, reduced N1 amplitudes to unisensory and bisensory stimuli in SZ were found in the no noise condition. In HC the N1 amplitudes were positively related to the speech recognition performance, whereas no such relationships were found in SZ. Moreover, no group differences in multisensory speech recognition benefits and N1 suppression effects for bisensory stimuli were observed. Conclusion Our study suggests that reduced N1 amplitudes reflect early auditory and audiovisual speech processing deficits in SZ. The findings that the amplitude effects were confined to salient speech stimuli and the attenuated relationship with behavioral performance in patients compared to HC, indicates a diminished decoding of the auditory speech signals in SZs. Our study also revealed relatively intact multisensory benefits in SZs, which implies that the observed auditory and audiovisual speech recognition deficits were primarily related to aberrant processing of the auditory syllables.
Collapse
Affiliation(s)
- Daniel Senkowski
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Psychiatry and Psychotherapy, Charitéplatz 1, 10117 Berlin, Germany.
| | - James K Moran
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Psychiatry and Psychotherapy, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
14
|
Tang X, Yuan M, Shi Z, Gao M, Ren R, Wei M, Gao Y. Multisensory integration attenuates visually induced oculomotor inhibition of return. J Vis 2022; 22:7. [PMID: 35297999 PMCID: PMC8944392 DOI: 10.1167/jov.22.4.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Inhibition of return (IOR) is a mechanism of the attention system involving bias toward novel stimuli and delayed generation of responses to targets at previously attended locations. According to the two-component theory, IOR consists of a perceptual component and an oculomotor component (oculomotor IOR [O-IOR]) depending on whether the eye movement system is activated. Previous studies have shown that multisensory integration weakens IOR when paying attention to both visual and auditory modalities. However, it remains unclear whether the O-IOR effect attenuated by multisensory integration also occurs when the oculomotor system is activated. Here, using two eye movement experiments, we investigated the effect of multisensory integration on O-IOR using the exogenous spatial cueing paradigm. In Experiment 1, we found a greater visual O-IOR effect compared with audiovisual and auditory O-IOR in divided modality attention. The relative multisensory response enhancement (rMRE) and violations of Miller's bound showed a greater magnitude of multisensory integration in the cued location compared with the uncued location. In Experiment 2, the magnitude of the audiovisual O-IOR effect was significantly less than that of the visual O-IOR in single visual modality selective attention. Implications for the effect of multisensory integration on O-IOR were discussed under conditions of oculomotor system activation, shedding new light on the two-component theory of IOR.
Collapse
Affiliation(s)
- Xiaoyu Tang
- School of Psychology, Liaoning Collaborative Innovation Center of Children and Adolescents Healthy Personality Assessment and Cultivation, Liaoning Normal University, Dalian, China.,
| | - Mengying Yuan
- School of Psychology, Liaoning Collaborative Innovation Center of Children and Adolescents Healthy Personality Assessment and Cultivation, Liaoning Normal University, Dalian, China.,
| | - Zhongyu Shi
- School of Psychology, Liaoning Collaborative Innovation Center of Children and Adolescents Healthy Personality Assessment and Cultivation, Liaoning Normal University, Dalian, China.,
| | - Min Gao
- School of Psychology, Liaoning Collaborative Innovation Center of Children and Adolescents Healthy Personality Assessment and Cultivation, Liaoning Normal University, Dalian, China.,
| | - Rongxia Ren
- Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama, Japan.,
| | - Ming Wei
- School of Psychology, Liaoning Collaborative Innovation Center of Children and Adolescents Healthy Personality Assessment and Cultivation, Liaoning Normal University, Dalian, China.,
| | - Yulin Gao
- Department of Psychology, Jilin University, Changchun, China.,
| |
Collapse
|
15
|
Pattamadilok C, Sato M. How are visemes and graphemes integrated with speech sounds during spoken word recognition? ERP evidence for supra-additive responses during audiovisual compared to auditory speech processing. BRAIN AND LANGUAGE 2022; 225:105058. [PMID: 34929531 DOI: 10.1016/j.bandl.2021.105058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/31/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Both visual articulatory gestures and orthography provide information on the phonological content of speech. This EEG study investigated the integration between speech and these two visual inputs. A comparison of skilled readers' brain responses elicited by a spoken word presented alone versus synchronously with a static image of a viseme or a grapheme of the spoken word's onset showed that while neither visual input induced audiovisual integration on N1 acoustic component, both led to a supra-additive integration on P2, with a stronger integration between speech and graphemes on left-anterior electrodes. This pattern persisted in P350 time-window and generalized to all electrodes. The finding suggests a strong impact of spelling knowledge on phonetic processing and lexical access. It also indirectly indicates that the dynamic and predictive value present in natural lip movements but not in static visemes is particularly critical to the contribution of visual articulatory gestures to speech processing.
Collapse
Affiliation(s)
| | - Marc Sato
- Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
16
|
López Zunini RA, Baart M, Samuel AG, Armstrong BC. Lexico-semantic access and audiovisual integration in the aging brain: Insights from mixed-effects regression analyses of event-related potentials. Neuropsychologia 2021; 165:108107. [PMID: 34921819 DOI: 10.1016/j.neuropsychologia.2021.108107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 11/17/2021] [Accepted: 11/29/2021] [Indexed: 10/19/2022]
Abstract
We investigated how aging modulates lexico-semantic processes in the visual (seeing written items), auditory (hearing spoken items) and audiovisual (seeing written items while hearing congruent spoken items) modalities. Participants were young and older adults who performed a delayed lexical decision task (LDT) presented in blocks of visual, auditory, and audiovisual stimuli. Event-related potentials (ERPs) revealed differences between young and older adults despite older adults' ability to identify words and pseudowords as accurately as young adults. The observed differences included more focalized lexico-semantic access in the N400 time window in older relative to young adults, stronger re-instantiation and/or more widespread activity of the lexicality effect at the time of responding, and stronger multimodal integration for older relative to young adults. Our results offer new insights into how functional neural differences in older adults can result in efficient access to lexico-semantic representations across the lifespan.
Collapse
Affiliation(s)
| | - Martijn Baart
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Tilburg University, Dept. of Cognitive Neuropsychology, Tilburg, the Netherlands
| | - Arthur G Samuel
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Spain; Stony Brook University, Dept. of Psychology, Stony Brook, NY, United States
| | - Blair C Armstrong
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; University of Toronto, Dept. of Psychology and Department of Language Studies, Toronto, ON, Canada
| |
Collapse
|
17
|
Sun J, Wang Z, Tian X. Manual Gestures Modulate Early Neural Responses in Loudness Perception. Front Neurosci 2021; 15:634967. [PMID: 34539324 PMCID: PMC8440995 DOI: 10.3389/fnins.2021.634967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 08/06/2021] [Indexed: 12/02/2022] Open
Abstract
How different sensory modalities interact to shape perception is a fundamental question in cognitive neuroscience. Previous studies in audiovisual interaction have focused on abstract levels such as categorical representation (e.g., McGurk effect). It is unclear whether the cross-modal modulation can extend to low-level perceptual attributes. This study used motional manual gestures to test whether and how the loudness perception can be modulated by visual-motion information. Specifically, we implemented a novel paradigm in which participants compared the loudness of two consecutive sounds whose intensity changes around the just noticeable difference (JND), with manual gestures concurrently presented with the second sound. In two behavioral experiments and two EEG experiments, we investigated our hypothesis that the visual-motor information in gestures would modulate loudness perception. Behavioral results showed that the gestural information biased the judgment of loudness. More importantly, the EEG results demonstrated that early auditory responses around 100 ms after sound onset (N100) were modulated by the gestures. These consistent results in four behavioral and EEG experiments suggest that visual-motor processing can integrate with auditory processing at an early perceptual stage to shape the perception of a low-level perceptual attribute such as loudness, at least under challenging listening conditions.
Collapse
Affiliation(s)
- Jiaqiu Sun
- Division of Arts and Sciences, New York University Shanghai, Shanghai, China.,NYU-ECNU Institute of Brain and Cognitive Science, New York University Shanghai, Shanghai, China
| | - Ziqing Wang
- NYU-ECNU Institute of Brain and Cognitive Science, New York University Shanghai, Shanghai, China.,Shanghai Key Laboratory of Brain Functional Genomics, Ministry of Education, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| | - Xing Tian
- Division of Arts and Sciences, New York University Shanghai, Shanghai, China.,NYU-ECNU Institute of Brain and Cognitive Science, New York University Shanghai, Shanghai, China.,Shanghai Key Laboratory of Brain Functional Genomics, Ministry of Education, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
| |
Collapse
|
18
|
Tremblay P, Basirat A, Pinto S, Sato M. Visual prediction cues can facilitate behavioural and neural speech processing in young and older adults. Neuropsychologia 2021; 159:107949. [PMID: 34228997 DOI: 10.1016/j.neuropsychologia.2021.107949] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 06/16/2021] [Accepted: 07/01/2021] [Indexed: 02/06/2023]
Abstract
The ability to process speech evolves over the course of the lifespan. Understanding speech at low acoustic intensity and in the presence of background noise becomes harder, and the ability for older adults to benefit from audiovisual speech also appears to decline. These difficulties can have important consequences on quality of life. Yet, a consensus on the cause of these difficulties is still lacking. The objective of this study was to examine the processing of speech in young and older adults under different modalities (i.e. auditory [A], visual [V], audiovisual [AV]) and in the presence of different visual prediction cues (i.e., no predictive cue (control), temporal predictive cue, phonetic predictive cue, and combined temporal and phonetic predictive cues). We focused on recognition accuracy and four auditory evoked potential (AEP) components: P1-N1-P2 and N2. Thirty-four right-handed French-speaking adults were recruited, including 17 younger adults (28 ± 2 years; 20-42 years) and 17 older adults (67 ± 3.77 years; 60-73 years). Participants completed a forced-choice speech identification task. The main findings of the study are: (1) The faciliatory effect of visual information was reduced, but present, in older compared to younger adults, (2) visual predictive cues facilitated speech recognition in younger and older adults alike, (3) age differences in AEPs were localized to later components (P2 and N2), suggesting that aging predominantly affects higher-order cortical processes related to speech processing rather than lower-level auditory processes. (4) Specifically, AV facilitation on P2 amplitude was lower in older adults, there was a reduced effect of the temporal predictive cue on N2 amplitude for older compared to younger adults, and P2 and N2 latencies were longer for older adults. Finally (5) behavioural performance was associated with P2 amplitude in older adults. Our results indicate that aging affects speech processing at multiple levels, including audiovisual integration (P2) and auditory attentional processes (N2). These findings have important implications for understanding barriers to communication in older ages, as well as for the development of compensation strategies for those with speech processing difficulties.
Collapse
Affiliation(s)
- Pascale Tremblay
- Département de Réadaptation, Faculté de Médecine, Université Laval, Quebec City, Canada; Cervo Brain Research Centre, Quebec City, Canada.
| | - Anahita Basirat
- Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France
| | - Serge Pinto
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| | - Marc Sato
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
19
|
Begau A, Klatt LI, Wascher E, Schneider D, Getzmann S. Do congruent lip movements facilitate speech processing in a dynamic audiovisual multi-talker scenario? An ERP study with older and younger adults. Behav Brain Res 2021; 412:113436. [PMID: 34175355 DOI: 10.1016/j.bbr.2021.113436] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/26/2021] [Accepted: 06/21/2021] [Indexed: 11/26/2022]
Abstract
In natural conversations, visible mouth and lip movements play an important role in speech comprehension. There is evidence that visual speech information improves speech comprehension, especially for older adults and under difficult listening conditions. However, the neurocognitive basis is still poorly understood. The present EEG experiment investigated the benefits of audiovisual speech in a dynamic cocktail-party scenario with 22 (aged 20-34 years) younger and 20 (aged 55-74 years) older participants. We presented three simultaneously talking faces with a varying amount of visual speech input (still faces, visually unspecific and audiovisually congruent). In a two-alternative forced-choice task, participants had to discriminate target words ("yes" or "no") among two distractors (one-digit number words). In half of the experimental blocks, the target was always presented from a central position, in the other half, occasional switches to a lateral position could occur. We investigated behavioral and electrophysiological modulations due to age, location switches and the content of visual information, analyzing response times and accuracy as well as the P1, N1, P2, N2 event-related potentials (ERPs) and the contingent negative variation (CNV) in the EEG. We found that audiovisually congruent speech information improved performance and modulated ERP amplitudes in both age groups, suggesting enhanced preparation and integration of the subsequent auditory input. In the older group, larger amplitude measures were found in early phases of processing (P1-N1). Here, amplitude measures were reduced in response to audiovisually congruent stimuli. In later processing phases (P2-N2) we found decreased amplitude measures in the older group, while an amplitude reduction for audiovisually congruent compared to visually unspecific stimuli was still observable. However, these benefits were only observed as long as no location switches occurred, leading to enhanced amplitude measures in later processing phases (P2-N2). To conclude, meaningful visual information in a multi-talker setting, when presented from the expected location, is shown to be beneficial for both younger and older adults.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany.
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| |
Collapse
|
20
|
Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021; 16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open
Abstract
Speech is perceived with both the ears and the eyes. Adding congruent visual speech improves the perception of a faint auditory speech stimulus, whereas adding incongruent visual speech can alter the perception of the utterance. The latter phenomenon is the case of the McGurk illusion, where an auditory stimulus such as e.g. "ba" dubbed onto a visual stimulus such as "ga" produces the illusion of hearing "da". Bayesian models of multisensory perception suggest that both the enhancement and the illusion case can be described as a two-step process of binding (informed by prior knowledge) and fusion (informed by the information reliability of each sensory cue). However, there is to date no study which has accounted for how they each contribute to audiovisual speech perception. In this study, we expose subjects to both congruent and incongruent audiovisual speech, manipulating the binding and the fusion stages simultaneously. This is done by varying both temporal offset (binding) and auditory and visual signal-to-noise ratio (fusion). We fit two Bayesian models to the behavioural data and show that they can both account for the enhancement effect in congruent audiovisual speech, as well as the McGurk illusion. This modelling approach allows us to disentangle the effects of binding and fusion on behavioural responses. Moreover, we find that these models have greater predictive power than a forced fusion model. This study provides a systematic and quantitative approach to measuring audiovisual integration in the perception of the McGurk illusion as well as congruent audiovisual speech, which we hope will inform future work on audiovisual speech perception.
Collapse
Affiliation(s)
- Alma Lindborg
- Department of Psychology, University of Potsdam, Potsdam, Germany
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tobias S. Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
21
|
Sorati M, Behne DM. Considerations in Audio-Visual Interaction Models: An ERP Study of Music Perception by Musicians and Non-musicians. Front Psychol 2021; 11:594434. [PMID: 33551911 PMCID: PMC7854916 DOI: 10.3389/fpsyg.2020.594434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 12/03/2020] [Indexed: 11/13/2022] Open
Abstract
Previous research with speech and non-speech stimuli suggested that in audiovisual perception, visual information starting prior to the onset of corresponding sound can provide visual cues, and form a prediction about the upcoming auditory sound. This prediction leads to audiovisual (AV) interaction. Auditory and visual perception interact and induce suppression and speeding up of the early auditory event-related potentials (ERPs) such as N1 and P2. To investigate AV interaction, previous research examined N1 and P2 amplitudes and latencies in response to audio only (AO), video only (VO), audiovisual, and control (CO) stimuli, and compared AV with auditory perception based on four AV interaction models (AV vs. AO+VO, AV-VO vs. AO, AV-VO vs. AO-CO, AV vs. AO). The current study addresses how different models of AV interaction express N1 and P2 suppression in music perception. Furthermore, the current study took one step further and examined whether previous musical experience, which can potentially lead to higher N1 and P2 amplitudes in auditory perception, influenced AV interaction in different models. Musicians and non-musicians were presented the recordings (AO, AV, VO) of a keyboard /C4/ key being played, as well as CO stimuli. Results showed that AV interaction models differ in their expression of N1 and P2 amplitude and latency suppression. The calculation of model (AV-VO vs. AO) and (AV-VO vs. AO-CO) has consequences for the resulting N1 and P2 difference waves. Furthermore, while musicians, compared to non-musicians, showed higher N1 amplitude in auditory perception, suppression of amplitudes and latencies for N1 and P2 was similar for the two groups across the AV models. Collectively, these results suggest that when visual cues from finger and hand movements predict the upcoming sound in AV music perception, suppression of early ERPs is similar for musicians and non-musicians. Notably, the calculation differences across models do not lead to the same pattern of results for N1 and P2, demonstrating that the four models are not interchangeable and are not directly comparable.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
22
|
Lalonde K, Werner LA. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci 2021; 11:49. [PMID: 33466253 PMCID: PMC7824772 DOI: 10.3390/brainsci11010049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 12/30/2020] [Indexed: 02/07/2023] Open
Abstract
The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants' and children's use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE 68131, USA
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA;
| |
Collapse
|
23
|
Zhao Z, Lei S, Weiqi H, Suyong Y, Wenbo L. The influence of the cross-modal emotional pre-preparation effect on audiovisual integration. Neuroreport 2020; 31:1161-1166. [PMID: 32991523 DOI: 10.1097/wnr.0000000000001530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Previous studies have shown that the cross-modal pre-preparation effect is an important factor for audiovisual integration. However, the facilitating influence of the pre-preparation effect on the integration of emotional cues remains unclear. Therefore, this study examined the emotional pre-preparation effect during the multistage process of audiovisual integration. Event-related potentials (ERPs) were recorded while participants performed a synchronous or asynchronous integration task with fearful or neutral stimuli. The results indicated that, compared with the sum of the unisensory presentation of visual (V) and auditory (A) stimuli (A+V), only fearful audiovisual stimuli induced a decreased N1 and an enhanced P2; this was not found for the neutral stimuli. Moreover, the fearful stimuli triggered a larger P2 than the neutral stimuli in the audiovisual condition, but not in the sum of the combined (A+V) waveforms. Our findings imply that, in the early perceptual processing stage and perceptual fine processing stage, fear improves the processing efficiency of the emotional audiovisual integration. In the last cognitively assessing stage, the fearful audiovisual induced a larger late positive component (LPC) than the neutral audiovisual. Moreover, the asynchronous-audiovisual induced a greater LPC than the synchronous-audiovisual during the 400-550 ms period. The different integration effects between the fearful and neutral stimuli may reflect the existence of distinct mechanisms of the pre-preparation in terms of the emotional dimension. In light of these results, we present a cross-modal emotional pre-preparation effect involving a three-phase emotional audiovisual integration.
Collapse
Affiliation(s)
- Zhang Zhao
- Institute of Psychology, Weifang Medical University, Weifang.,Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University.,Key Laboratory of Brain and Cognitive Neurosience, Dalian, Liaoning Province
| | - Sun Lei
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University.,Key Laboratory of Brain and Cognitive Neurosience, Dalian, Liaoning Province
| | - He Weiqi
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University.,Key Laboratory of Brain and Cognitive Neurosience, Dalian, Liaoning Province
| | - Yang Suyong
- School of Psychology, Shanghai University of Sport, Shanghai, China
| | - Luo Wenbo
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University.,Key Laboratory of Brain and Cognitive Neurosience, Dalian, Liaoning Province
| |
Collapse
|
24
|
The Cross-Modal Suppressive Role of Visual Context on Speech Intelligibility: An ERP Study. Brain Sci 2020; 10:brainsci10110810. [PMID: 33147691 PMCID: PMC7692090 DOI: 10.3390/brainsci10110810] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 11/20/2022] Open
Abstract
The efficacy of audiovisual (AV) integration is reflected in the degree of cross-modal suppression of the auditory event-related potentials (ERPs, P1-N1-P2), while stronger semantic encoding is reflected in enhanced late ERP negativities (e.g., N450). We hypothesized that increasing visual stimulus reliability should lead to more robust AV-integration and enhanced semantic prediction, reflected in suppression of auditory ERPs and enhanced N450, respectively. EEG was acquired while individuals watched and listened to clear and blurred videos of a speaker uttering intact or highly-intelligible degraded (vocoded) words and made binary judgments about word meaning (animate or inanimate). We found that intact speech evoked larger negativity between 280–527-ms than vocoded speech, suggestive of more robust semantic prediction for the intact signal. For visual reliability, we found that greater cross-modal ERP suppression occurred for clear than blurred videos prior to sound onset and for the P2 ERP. Additionally, the later semantic-related negativity tended to be larger for clear than blurred videos. These results suggest that the cross-modal effect is largely confined to suppression of early auditory networks with weak effect on networks associated with semantic prediction. However, the semantic-related visual effect on the late negativity may have been tempered by the vocoded signal’s high-reliability.
Collapse
|
25
|
Sorati M, Behne DM. Audiovisual Modulation in Music Perception for Musicians and Non-musicians. Front Psychol 2020; 11:1094. [PMID: 32547458 PMCID: PMC7273518 DOI: 10.3389/fpsyg.2020.01094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/29/2020] [Indexed: 11/13/2022] Open
Abstract
In audiovisual music perception, visual information from a musical instrument being played is available prior to the onset of the corresponding musical sound and consequently allows a perceiver to form a prediction about the upcoming audio music. This prediction in audiovisual music perception, compared to auditory music perception, leads to lower N1 and P2 amplitudes and latencies. Although previous research suggests that audiovisual experience, such as previous musical experience may enhance this prediction, a remaining question is to what extent musical experience modifies N1 and P2 amplitudes and latencies. Furthermore, corresponding event-related phase modulations quantified as inter-trial phase coherence (ITPC) have not previously been reported for audiovisual music perception. In the current study, audio video recordings of a keyboard key being played were presented to musicians and non-musicians in audio only (AO), video only (VO), and audiovisual (AV) conditions. With predictive movements from playing the keyboard isolated from AV music perception (AV-VO), the current findings demonstrated that, compared to the AO condition, both groups had a similar decrease in N1 amplitude and latency, and P2 amplitude, along with correspondingly lower ITPC values in the delta, theta, and alpha frequency bands. However, while musicians showed lower ITPC values in the beta-band in AV-VO compared to the AO, non-musicians did not show this pattern. Findings indicate that AV perception may be broadly correlated with auditory perception, and differences between musicians and non-musicians further indicate musical experience to be a specific factor influencing AV perception. Predicting an upcoming sound in AV music perception may involve visual predictory processes, as well as beta-band oscillations, which may be influenced by years of musical training. This study highlights possible interconnectivity in AV perception as well as potential modulation with experience.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn Marie Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
26
|
Jaha N, Shen S, Kerlin JR, Shahin AJ. Visual Enhancement of Relevant Speech in a 'Cocktail Party'. Multisens Res 2020; 33:277-294. [PMID: 32508080 DOI: 10.1163/22134808-20191423] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a 'cocktail party' by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.
Collapse
Affiliation(s)
- Niti Jaha
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Stanley Shen
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Jess R Kerlin
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Antoine J Shahin
- 1Center for Mind and Brain, University of California, Davis, 95618, USA.,2Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| |
Collapse
|
27
|
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech. J Neurosci 2019; 40:1053-1065. [PMID: 31889007 DOI: 10.1523/jneurosci.1101-19.2019] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/28/2019] [Accepted: 12/04/2019] [Indexed: 11/21/2022] Open
Abstract
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Collapse
|
28
|
Sorati M, Behne DM. Musical Expertise Affects Audiovisual Speech Perception: Findings From Event-Related Potentials and Inter-trial Phase Coherence. Front Psychol 2019; 10:2562. [PMID: 31803107 PMCID: PMC6874039 DOI: 10.3389/fpsyg.2019.02562] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/29/2019] [Indexed: 12/03/2022] Open
Abstract
In audiovisual speech perception, visual information from a talker's face during mouth articulation is available before the onset of the corresponding audio speech, and thereby allows the perceiver to use visual information to predict the upcoming audio. This prediction from phonetically congruent visual information modulates audiovisual speech perception and leads to a decrease in N1 and P2 amplitudes and latencies compared to the perception of audio speech alone. Whether audiovisual experience, such as with musical training, influences this prediction is unclear, but if so, may explain some of the variations observed in previous research. The current study addresses whether audiovisual speech perception is affected by musical training, first assessing N1 and P2 event-related potentials (ERPs) and in addition, inter-trial phase coherence (ITPC). Musicians and non-musicians are presented the syllable, /ba/ in audio only (AO), video only (VO), and audiovisual (AV) conditions. With the predictory effect of mouth movement isolated from the AV speech (AV-VO), results showed that, compared to audio speech, both groups have a lower N1 latency and P2 amplitude and latency. Moreover, they also showed lower ITPCs in the delta, theta, and beta bands in audiovisual speech perception. However, musicians showed significant suppression of N1 amplitude and desynchronization in the alpha band in audiovisual speech, not present for non-musicians. Collectively, the current findings indicate that early sensory processing can be modified by musical experience, which in turn can explain some of the variations in previous AV speech perception research.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | | |
Collapse
|
29
|
Lalonde K, Werner LA. Infants and Adults Use Visual Cues to Improve Detection and Discrimination of Speech in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3860-3875. [PMID: 31618097 PMCID: PMC7201336 DOI: 10.1044/2019_jslhr-h-19-0106] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/30/2019] [Accepted: 07/08/2019] [Indexed: 06/10/2023]
Abstract
Purpose This study assessed the extent to which 6- to 8.5-month-old infants and 18- to 30-year-old adults detect and discriminate auditory syllables in noise better in the presence of visual speech than in auditory-only conditions. In addition, we examined whether visual cues to the onset and offset of the auditory signal account for this benefit. Method Sixty infants and 24 adults were randomly assigned to speech detection or discrimination tasks and were tested using a modified observer-based psychoacoustic procedure. Each participant completed 1-3 conditions: auditory-only, with visual speech, and with a visual signal that only cued the onset and offset of the auditory syllable. Results Mixed linear modeling indicated that infants and adults benefited from visual speech on both tasks. Adults relied on the onset-offset cue for detection, but the same cue did not improve their discrimination. The onset-offset cue benefited infants for both detection and discrimination. Whereas the onset-offset cue improved detection similarly for infants and adults, the full visual speech signal benefited infants to a lesser extent than adults on the discrimination task. Conclusions These results suggest that infants' use of visual onset-offset cues is mature, but their ability to use more complex visual speech cues is still developing. Additional research is needed to explore differences in audiovisual enhancement (a) of speech discrimination across speech targets and (b) with increasingly complex tasks and stimuli.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Lynne A. Werner
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
30
|
The impact of when, what and how predictions on auditory speech perception. Exp Brain Res 2019; 237:3143-3153. [DOI: 10.1007/s00221-019-05661-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/24/2019] [Indexed: 11/26/2022]
|
31
|
Abstract
The present study aimed to investigate whether or not the so-called "bouba-kiki" effect is mediated by speech-specific representations. Sine-wave versions of naturally produced pseudowords were used as auditory stimuli in an implicit association task (IAT) and an explicit cross-modal matching (CMM) task to examine cross-modal shape-sound correspondences. A group of participants trained to hear the sine-wave stimuli as speech was compared to a group that heard them as non-speech sounds. Sound-shape correspondence effects were observed in both groups and tasks, indicating that speech-specific processing is not fundamental to the "bouba-kiki" phenomenon. Effects were similar across groups in the IAT, while in the CMM task the speech-mode group showed a stronger effect compared with the non-speech group. This indicates that, while both tasks reflect auditory-visual associations, only the CMM task is additionally sensitive to associations involving speech-specific representations.
Collapse
|
32
|
Kolozsvári OB, Xu W, Leppänen PHT, Hämäläinen JA. Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level. Front Hum Neurosci 2019; 13:243. [PMID: 31354459 PMCID: PMC6639789 DOI: 10.3389/fnhum.2019.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 06/28/2019] [Indexed: 11/13/2022] Open
Abstract
During speech perception, listeners rely on multimodal input and make use of both auditory and visual information. When presented with speech, for example syllables, the differences in brain responses to distinct stimuli are not, however, caused merely by the acoustic or visual features of the stimuli. The congruency of the auditory and visual information and the familiarity of a syllable, that is, whether it appears in the listener's native language or not, also modulates brain responses. We investigated how the congruency and familiarity of the presented stimuli affect brain responses to audio-visual (AV) speech in 12 adult Finnish native speakers and 12 adult Chinese native speakers. They watched videos of a Chinese speaker pronouncing syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a magnetoencephalography (MEG) measurement where only /pa/ and /ta/ were part of Finnish phonology while all the stimuli were part of Chinese phonology. The stimuli were presented in audio-visual (congruent or incongruent), audio only, or visual only conditions. The brain responses were examined in five time-windows: 75-125, 150-200, 200-300, 300-400, and 400-600 ms. We found significant differences for the congruency comparison in the fourth time-window (300-400 ms) in both sensor and source level analysis. Larger responses were observed for the incongruent stimuli than for the congruent stimuli. For the familiarity comparisons no significant differences were found. The results are in line with earlier studies reporting on the modulation of brain responses for audio-visual congruency around 250-500 ms. This suggests a much stronger process for the general detection of a mismatch between predictions based on lip movements and the auditory signal than for the top-down modulation of brain responses based on phonological information.
Collapse
Affiliation(s)
- Orsolya B Kolozsvári
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Weiyong Xu
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Paavo H T Leppänen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Jarmo A Hämäläinen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
33
|
Fixating the eyes of a speaker provides sufficient visual information to modulate early auditory processing. Biol Psychol 2019; 146:107724. [PMID: 31323242 DOI: 10.1016/j.biopsycho.2019.107724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/26/2019] [Accepted: 06/29/2019] [Indexed: 11/23/2022]
Abstract
In face-to-face conversations, when listeners process and combine information obtained from hearing and seeing a speaker, they mostly look at the eyes rather than at the more informative mouth region. Measuring event-related potentials, we tested whether fixating the speaker's eyes is sufficient for gathering enough visual speech information to modulate early auditory processing, or whether covert attention to the speaker's mouth is needed. Results showed that when listeners fixated the eye region of the speaker, the amplitudes of the auditory evoked N1 and P2 were reduced when listeners heard and saw the speaker than when they only heard her. These cross-modal interactions also occurred when, in addition, attention was restricted to the speaker's eye region. Fixating the speaker's eyes thus provides listeners with sufficient visual information to facilitate early auditory processing. The spread of covert attention to the mouth area is not needed to observe audiovisual interactions.
Collapse
|
34
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
35
|
Abstract
Even when speakers are not actively doing another task, they can be interfered in their speech planning by concurrent auditory stimuli. In this study, we used picture naming with passive hearing, or active listening, combined to high-density electroencephalographic (EEG) recordings to investigate the locus and origin of interference on speech production. Participants named pictures while ignoring (or paying attention to) auditory syllables presented at different intervals (+150 ms, +300 ms or +450 ms). Interference of passive hearing was observed at all positive stimulus onset asynchronies (SOA) including when distractors appeared 450 ms after picture onset. Analyses of ERPs and microstates revealed modulations appearing in a time-window close to verbal response onset likely relating to post-lexical planning processes. A shift of latency of the N1 auditory component for syllables displayed 450 ms after picture onset relative to hearing in isolation was also observed. Data from picture naming with active listening to auditory syllables also pointed to post-lexical interference. The present study suggests that, beyond the lexical stage, post-lexical processes can be interfered and that the reciprocal interference between utterance planning and hearing relies on attentional demand and possibly competing neural substrates.
Collapse
|
36
|
Jerger S, Damian MF, Karl C, Abdi H. Developmental Shifts in Detection and Attention for Auditory, Visual, and Audiovisual Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:3095-3112. [PMID: 30515515 PMCID: PMC6440305 DOI: 10.1044/2018_jslhr-h-17-0343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Revised: 01/02/2018] [Accepted: 07/16/2018] [Indexed: 06/09/2023]
Abstract
PURPOSE Successful speech processing depends on our ability to detect and integrate multisensory cues, yet there is minimal research on multisensory speech detection and integration by children. To address this need, we studied the development of speech detection for auditory (A), visual (V), and audiovisual (AV) input. METHOD Participants were 115 typically developing children clustered into age groups between 4 and 14 years. Speech detection (quantified by response times [RTs]) was determined for 1 stimulus, /buh/, presented in A, V, and AV modes (articulating vs. static facial conditions). Performance was analyzed not only in terms of traditional mean RTs but also in terms of the faster versus slower RTs (defined by the 1st vs. 3rd quartiles of RT distributions). These time regions were conceptualized respectively as reflecting optimal detection with efficient focused attention versus less optimal detection with inefficient focused attention due to attentional lapses. RESULTS Mean RTs indicated better detection (a) of multisensory AV speech than A speech only in 4- to 5-year-olds and (b) of A and AV inputs than V input in all age groups. The faster RTs revealed that AV input did not improve detection in any group. The slower RTs indicated that (a) the processing of silent V input was significantly faster for the articulating than static face and (b) AV speech or facial input significantly minimized attentional lapses in all groups except 6- to 7-year-olds (a peaked U-shaped curve). Apparently, the AV benefit observed for mean performance in 4- to 5-year-olds arose from effects of attention. CONCLUSIONS The faster RTs indicated that AV input did not enhance detection in any group, but the slower RTs indicated that AV speech and dynamic V speech (mouthing) significantly minimized attentional lapses and thus did influence performance. Overall, A and AV inputs were detected consistently faster than V input; this result endorsed stimulus-bound auditory processing by these children.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, Richardson, TX
| | - Markus F. Damian
- School of Experimental Psychology, University of Bristol, United Kingdom
| | - Cassandra Karl
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, Richardson, TX
| | - Hervé Abdi
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
| |
Collapse
|
37
|
Biau E, Kotz SA. Lower Beta: A Central Coordinator of Temporal Prediction in Multimodal Speech. Front Hum Neurosci 2018; 12:434. [PMID: 30405383 PMCID: PMC6207805 DOI: 10.3389/fnhum.2018.00434] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 10/03/2018] [Indexed: 12/18/2022] Open
Abstract
How the brain decomposes and integrates information in multimodal speech perception is linked to oscillatory dynamics. However, how speech takes advantage of redundancy between different sensory modalities, and how this translates into specific oscillatory patterns remains unclear. We address the role of lower beta activity (~20 Hz), generally associated with motor functions, as an amodal central coordinator that receives bottom-up delta-theta copies from specific sensory areas and generate top-down temporal predictions for auditory entrainment. Dissociating temporal prediction from entrainment may explain how and why visual input benefits speech processing rather than adding cognitive load in multimodal speech perception. On the one hand, body movements convey prosodic and syllabic features at delta and theta rates (i.e., 1–3 Hz and 4–7 Hz). On the other hand, the natural precedence of visual input before auditory onsets may prepare the brain to anticipate and facilitate the integration of auditory delta-theta copies of the prosodic-syllabic structure. Here, we identify three fundamental criteria based on recent evidence and hypotheses, which support the notion that lower motor beta frequency may play a central and generic role in temporal prediction during speech perception. First, beta activity must respond to rhythmic stimulation across modalities. Second, beta power must respond to biological motion and speech-related movements conveying temporal information in multimodal speech processing. Third, temporal prediction may recruit a communication loop between motor and primary auditory cortices (PACs) via delta-to-beta cross-frequency coupling. We discuss evidence related to each criterion and extend these concepts to a beta-motivated framework of multimodal speech processing.
Collapse
Affiliation(s)
- Emmanuel Biau
- Basic and Applied Neuro Dynamics Laboratory, Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, University of Maastricht, Maastricht, Netherlands
| | - Sonja A Kotz
- Basic and Applied Neuro Dynamics Laboratory, Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, University of Maastricht, Maastricht, Netherlands.,Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
38
|
Hernández-Gutiérrez D, Abdel Rahman R, Martín-Loeches M, Muñoz F, Schacht A, Sommer W. Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence. Cortex 2018; 104:12-25. [DOI: 10.1016/j.cortex.2018.03.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 11/08/2017] [Accepted: 03/31/2018] [Indexed: 11/26/2022]
|
39
|
Shatzer H, Shen S, Kerlin JR, Pitt MA, Shahin AJ. Neurophysiology underlying influence of stimulus reliability on audiovisual integration. Eur J Neurosci 2018; 48:2836-2848. [PMID: 29363844 DOI: 10.1111/ejn.13843] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 12/15/2017] [Accepted: 01/08/2018] [Indexed: 12/01/2022]
Abstract
We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities.
Collapse
Affiliation(s)
- Hannah Shatzer
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Stanley Shen
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Mark A Pitt
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| |
Collapse
|
40
|
Treille A, Vilain C, Schwartz JL, Hueber T, Sato M. Electrophysiological evidence for Audio-visuo-lingual speech integration. Neuropsychologia 2018; 109:126-133. [DOI: 10.1016/j.neuropsychologia.2017.12.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 11/21/2017] [Accepted: 12/13/2017] [Indexed: 01/25/2023]
|
41
|
Burnham D, Dodd B. Language–General Auditory–Visual Speech Perception: Thai–English and Japanese–English McGurk Effects. Multisens Res 2018; 31:79-110. [DOI: 10.1163/22134808-00002590] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 06/19/2017] [Indexed: 11/19/2022]
Abstract
Cross-language McGurk Effects are used to investigate the locus of auditory–visual speech integration. Experiment 1 uses the fact that [], as in ‘sing’, is phonotactically legal in word-final position in English and Thai, but in word-initial position only in Thai. English and Thai language participants were tested for ‘n’ perception from auditory [m]/visual [] (A[m]V[]) in word-initial and -final positions. Despite English speakers’ native language bias to label word-initial [] as ‘n’, the incidence of ‘n’ percepts to A[m]V[] was equivalent for English and Thai speakers in final and initial positions. Experiment 2 used the facts that (i) [ð] as in ‘that’ is not present in Japanese, and (ii) English speakers respond more often with ‘tha’ than ‘da’ to A[ba]V[ga], but more often with ‘di’ than ‘thi’ to A[bi]V[gi]. English and three groups of Japanese language participants (Beginner, Intermediate, Advanced English knowledge) were presented with A[ba]V[ga] and A[bi]V[gi] by an English (Experiment 2a) or a Japanese (Experiment 2b) speaker. Despite Japanese participants’ native language bias to perceive ‘d’ more often than ‘th’, the four groups showed a similar phonetic level effect of [a]/[i] vowel context × ‘th’ vs. ‘d’ responses to A[b]V[g] presentations. In Experiment 2b this phonetic level interaction held, but was more one-sided as very few ‘th’ responses were evident, even in Australian English participants. Results are discussed in terms of a phonetic plus postcategorical model, in which incoming auditory and visual information is integrated at a phonetic level, after which there are post-categorical phonemic influences.
Collapse
Affiliation(s)
- Denis Burnham
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, Australia
| | | |
Collapse
|
42
|
Baart M, Lindborg A, Andersen TS. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception. Eur J Neurosci 2017; 46:2578-2583. [PMID: 28976045 PMCID: PMC5725699 DOI: 10.1111/ejn.13734] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 09/27/2017] [Accepted: 09/27/2017] [Indexed: 11/30/2022]
Abstract
Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech‐induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, Tilburg, 5000 LE, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
43
|
Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration. J Neurosci 2017; 36:9888-95. [PMID: 27656026 DOI: 10.1523/jneurosci.1396-16.2016] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 08/03/2016] [Indexed: 11/21/2022] Open
Abstract
UNLABELLED Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise. SIGNIFICANCE STATEMENT The behavioral benefit of seeing a speaker's face during conversation is especially pronounced in challenging listening environments. However, the neural mechanisms underlying this phenomenon, known as inverse effectiveness, have not yet been established. Here, we examine this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech. We find that this benefit arises from our ability to integrate multimodal information over longer periods of time. Our data also suggest that the addition of visual speech restores early tracking of the acoustic speech signal during excessive background noise. These findings support and extend current mechanistic perspectives on AV speech perception.
Collapse
|
44
|
Semantic congruent audiovisual integration during the encoding stage of working memory: an ERP and sLORETA study. Sci Rep 2017; 7:5112. [PMID: 28698594 PMCID: PMC5505990 DOI: 10.1038/s41598-017-05471-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 05/31/2017] [Indexed: 11/09/2022] Open
Abstract
Although multisensory integration is an inherent component of functional brain organization, multisensory integration during working memory (WM) has attracted little attention. The present study investigated the neural properties underlying the multisensory integration of WM by comparing semantically related bimodal stimulus presentations with unimodal stimulus presentations and analysing the results using the standardized low-resolution brain electromagnetic tomography (sLORETA) source location approach. The results showed that the memory retrieval reaction times during congruent audiovisual conditions were faster than those during unisensory conditions. Moreover, our findings indicated that the event-related potential (ERP) for simultaneous audiovisual stimuli differed from the ERP for the sum of unisensory constituents during the encoding stage and occurred within a 236-530 ms timeframe over the frontal and parietal-occipital electrodes. The sLORETA images revealed a distributed network of brain areas that participate in the multisensory integration of WM. These results suggested that information inputs from different WM subsystems yielded nonlinear multisensory interactions and became integrated during the encoding stage. The multicomponent model of WM indicates that the central executive could play a critical role in the integration of information from different slave systems.
Collapse
|
45
|
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration. Exp Brain Res 2017; 235:2867-2876. [PMID: 28676921 DOI: 10.1007/s00221-017-5018-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 06/23/2017] [Indexed: 10/19/2022]
Abstract
Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Collapse
|
46
|
Kokinous J, Tavano A, Kotz SA, Schröger E. Perceptual integration of faces and voices depends on the interaction of emotional content and spatial frequency. Biol Psychol 2017; 123:155-165. [DOI: 10.1016/j.biopsycho.2016.12.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 10/11/2016] [Accepted: 12/11/2016] [Indexed: 10/20/2022]
|
47
|
Baart M. Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology 2016; 53:1295-306. [DOI: 10.1111/psyp.12683] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 05/09/2016] [Indexed: 11/29/2022]
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language; Donostia-San Sebastián Spain
- Department of Cognitive Neuropsychology; Tilburg University; Tilburg The Netherlands
| |
Collapse
|
48
|
Chen YC, Shore DI, Lewis TL, Maurer D. The development of the perception of audiovisual simultaneity. J Exp Child Psychol 2016; 146:17-33. [DOI: 10.1016/j.jecp.2016.01.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 01/09/2016] [Accepted: 01/12/2016] [Indexed: 10/22/2022]
|
49
|
Kaganovich N, Schumaker J. Electrophysiological correlates of individual differences in perception of audiovisual temporal asynchrony. Neuropsychologia 2016; 86:119-30. [PMID: 27094850 PMCID: PMC5137199 DOI: 10.1016/j.neuropsychologia.2016.04.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 04/14/2016] [Accepted: 04/15/2016] [Indexed: 10/21/2022]
Abstract
Sensitivity to the temporal relationship between auditory and visual stimuli is key to efficient audiovisual integration. However, even adults vary greatly in their ability to detect audiovisual temporal asynchrony. What underlies this variability is currently unknown. We recorded event-related potentials (ERPs) while participants performed a simultaneity judgment task on a range of audiovisual (AV) and visual-auditory (VA) stimulus onset asynchronies (SOAs) and compared ERP responses in good and poor performers to the 200ms SOA, which showed the largest individual variability in the number of synchronous perceptions. Analysis of ERPs to the VA200 stimulus yielded no significant results. However, those individuals who were more sensitive to the AV200 SOA had significantly more positive voltage between 210 and 270ms following the sound onset. In a follow-up analysis, we showed that the mean voltage within this window predicted approximately 36% of variability in sensitivity to AV temporal asynchrony in a larger group of participants. The relationship between the ERP measure in the 210-270ms window and accuracy on the simultaneity judgment task also held for two other AV SOAs with significant individual variability -100 and 300ms. Because the identified window was time-locked to the onset of sound in the AV stimulus, we conclude that sensitivity to AV temporal asynchrony is shaped to a large extent by the efficiency in the neural encoding of sound onsets.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States; Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, United States.
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| |
Collapse
|
50
|
Rosenblum LD, Dias JW, Dorsi J. The supramodal brain: implications for auditory perception. JOURNAL OF COGNITIVE PSYCHOLOGY 2016. [DOI: 10.1080/20445911.2016.1181691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|