1
|
Williams BT, Viswanathan N, Brouwer S. The effect of visual speech information on linguistic release from masking. J Acoust Soc Am 2023; 153:602. [PMID: 36732222 PMCID: PMC10162837 DOI: 10.1121/10.0016865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 11/30/2022] [Accepted: 12/23/2022] [Indexed: 05/07/2023]
Abstract
Listeners often experience challenges understanding a person (target) in the presence of competing talkers (maskers). This difficulty reduces with the availability of visual speech information (VSI; lip movements, degree of mouth opening) and during linguistic release from masking (LRM; masking decreases with dissimilar language maskers). We investigate whether and how LRM occurs with VSI. We presented English targets with either Dutch or English maskers in audio-only and audiovisual conditions to 62 American English participants. The signal-to-noise ratio (SNR) was easy at 0 audio-only and -8 dB audiovisual in Experiment 1 and hard at -8 and -16 dB in Experiment 2 to assess the effects of modality on LRM across the same and different SNRs. We found LRM in the audiovisual condition for all SNRs and in audio-only for -8 dB, demonstrating reliable LRM for audiovisual conditions. Results also revealed that LRM is modulated by modality with larger LRM in audio-only indicating that introducing VSI weakens LRM. Furthermore, participants showed higher performance for Dutch maskers compared to English maskers with and without VSI. This establishes that listeners use both VSI and dissimilar language maskers to overcome masking. Our study shows that LRM persists in the audiovisual modality and its strength depends on the modality.
Collapse
Affiliation(s)
- Brittany T Williams
- Department of Communication Sciences and Disorders, The Pennsylvania State University, State College, Pennsylvania 16801, USA
| | - Navin Viswanathan
- Department of Communication Sciences and Disorders, The Pennsylvania State University, State College, Pennsylvania 16801, USA
| | - Susanne Brouwer
- Department of Modern Languages and Cultures, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
2
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. J Acoust Soc Am 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
3
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
4
|
Evans KK. The Role of Selective Attention in Cross-modal Interactions between Auditory and Visual Features. Cognition 2019; 196:104119. [PMID: 31751823 DOI: 10.1016/j.cognition.2019.104119] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 10/02/2019] [Accepted: 10/25/2019] [Indexed: 11/24/2022]
Abstract
Evans and Treisman (2010) showed systematic interactions between audition and vision when participants made speeded classifications in one modality while supposedly ignoring another. We found perceptual facilitation between high pitch and high visual position, high spatial frequency and small size, and interference between high pitch and low position, low spatial frequency and large size, while the converse was the case between low pitch and the same visual features. The present study examined the role of selective attention in these cross-modal interactions. Participants performed speeded classification or search tasks of low or high load while attempting to ignore irrelevant stimuli in a different modality. In both paradigms, congruency between the visual and the irrelevant auditory stimulus had an equal effect in the low and in the high perceptual load conditions. A third experiment tested divided attention, requiring participants to compare stimuli across modalities and respond to the visual-auditory compound. The congruency effect was as large with attention focused on one modality as when it was divided across both. These findings offer converging evidence that cross-modal interactions between corresponding basic features are independent of selective attention.
Collapse
Affiliation(s)
- Karla K Evans
- University of York, Department of Psychology, Heslington, York, YO10 5DD, United Kingdom.
| |
Collapse
|
5
|
Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018; 48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Mireia Torralba
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
6
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
7
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
8
|
Abstract
Adults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: -b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children - like adults - perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences,GR4·1,University of Texas at Dallas, andCallier Center for Communication Disorders,Richardson,Texas
| | | | - Nancy Tye-Murray
- Department of Otolaryngology-Head and Neck Surgery,Washington University School of Medicine
| | - Hervé Abdi
- School of Behavioral and Brain Sciences,GR4·1,University of Texas at Dallas
| |
Collapse
|
9
|
Szycik G, Ye Z, Mohammadi B, Dillo W, te Wildt B, Samii A, Frieling H, Bleich S, Münte T. Maladaptive connectivity of Broca’s area in schizophrenia during audiovisual speech perception: An fMRI study. Neuroscience 2013; 253:274-82. [DOI: 10.1016/j.neuroscience.2013.08.041] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 08/18/2013] [Accepted: 08/22/2013] [Indexed: 11/16/2022]
|
10
|
Setti A, Burke KE, Kenny R, Newell FN. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes. Front Psychol 2013; 4:575. [PMID: 24027544 PMCID: PMC3760087 DOI: 10.3389/fpsyg.2013.00575] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 08/11/2013] [Indexed: 12/02/2022] Open
Abstract
Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the “McGurk illusion,” in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.
Collapse
Affiliation(s)
- Annalisa Setti
- Institute of Neuroscience, Trinity College Dublin Dublin, Ireland ; TRIL Centre, Trinity College Dublin Dublin, Ireland
| | | | | | | |
Collapse
|
11
|
Williams JA. Discrepant visual speech facilitates covert selective listening in "cocktail party" conditions. Percept Mot Skills 2012; 114:903-14. [PMID: 22913029 DOI: 10.2466/22.20.21.28.pms.114.3.903-914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The presence of congruent visual speech information facilitates the identification of auditory speech, while the addition of incongruent visual speech information often impairs accuracy. This latter arrangement occurs naturally when one is being directly addressed in conversation but listens to a different speaker. Under these conditions, performance may diminish since: (a) one is bereft of the facilitative effects of the corresponding lip motion and (b) one becomes subject to visual distortion by incongruent visual speech; by contrast, speech intelligibility may be improved due to (c) bimodal localization of the central unattended stimulus. Participants were exposed to centrally presented visual and auditory speech while attending to a peripheral speech stream. In some trials, the lip movements of the central visual stimulus matched the unattended speech stream; in others, the lip movements matched the attended peripheral speech. Accuracy for the peripheral stimulus was nearly one standard deviation greater with incongruent visual information, compared to the congruent condition which provided bimodal pattern recognition cues. Likely, the bimodal localization of the central stimulus further differentiated the stimuli and thus facilitated intelligibility. Results are discussed with regard to similar findings in an investigation of the ventriloquist effect, and the relative strength of localization and speech cues in covert listening.
Collapse
Affiliation(s)
- Jason A Williams
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, 93401, USA.
| |
Collapse
|
12
|
Viviani P, Figliozzi F, Lacquaniti F. The perception of visible speech: estimation of speech rate and detection of time reversals. Exp Brain Res 2011; 215:141-61. [PMID: 21986668 DOI: 10.1007/s00221-011-2883-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 09/16/2011] [Indexed: 11/29/2022]
Abstract
Four experiments investigated the perception of visible speech. Experiment 1 addressed the perception of speech rate. Observers were shown video-clips of the lower face of actors speaking at their spontaneous rate. Then, they were shown muted versions of the video-clips, which were either accelerated or decelerated. The task (scaling) was to compare visually the speech rate of the stimulus to the spontaneous rate of the actor being shown. Rate estimates were accurate when the video-clips were shown in the normal direction (forward mode). In contrast, speech rate was underestimated when the video-clips were shown in reverse (backward mode). Experiments 2-4 (2AFC) investigated how accurately one discriminates forward and backward speech movements. Unlike in Experiment 1, observers were never exposed to the sound track of the video-clips. Performance was well above chance when playback mode was crossed with rate modulation, and the number of repetitions of the stimuli allowed some amount of speechreading to take place in forward mode (Experiment 2). In Experiment 3, speechreading was made much more difficult by using a different and larger set of muted video-clips. Yet, accuracy decreased only slightly with respect to Experiment 2. Thus, kinematic rather then speechreading cues are most important for discriminating movement direction. Performance worsened, but remained above chance level when the same stimuli of Experiment 3 were rotated upside down (Experiment 4). We argue that the results are in keeping with the hypothesis that visual perception taps into implicit motor competence. Thus, lawful instances of biological movements (forward stimuli) are processed differently from backward stimuli representing movements that the observer cannot perform.
Collapse
Affiliation(s)
- Paolo Viviani
- Laboratory of Neuromotor Physiology, Santa Lucia Foundation, via Ardeatina, 306, 00179, Rome, Italy.
| | | | | |
Collapse
|
13
|
Szycik GR, Münte TF, Dillo W, Mohammadi B, Samii A, Emrich HM, Dietrich DE. Audiovisual integration of speech is disturbed in schizophrenia: an fMRI study. Schizophr Res 2009; 110:111-8. [PMID: 19303257 DOI: 10.1016/j.schres.2009.03.003] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2008] [Revised: 02/22/2009] [Accepted: 03/03/2009] [Indexed: 10/21/2022]
Abstract
Speech perception is an essential part of social interaction. Visual information (lip movements, facial expression) may supplement auditory information in particular under inadvertent listening situations. Schizophrenia patients have been shown to have a deficit in integrating articulatory motions with the auditory speech input. The goal of this study was to investigate the neural basis of this deficit in audiovisual speech processing in schizophrenia patients by using fMRI. Disyllabic nouns were presented in congruent (audio matches visual information) and incongruent conditions in a slow event related fMRI design. Schizophrenia patients (n=15) were compared to age and gender matched control participants. The statistical examination was conducted by analysis of variance with main factors: audiovisual congruency and group membership. The patients' brain activity differed from the control group as evidenced by congruency by group interaction effects. The pertinent brain sites were located predominantly in the right hemisphere and comprised the pars opercularis, middle frontal sulcus, and superior temporal gyrus. In addition, we observed interactions bilaterally in the fusiform gyrus and the nucleus accumbens. We suggest that schizophrenia patients' deficits in audiovisual integration during speech perception are due to a dysfunction of the speech motor system in the right hemisphere. Furthermore the results can be also seen as a reflection of reduced lateralization of language functions to the left hemisphere in schizophrenia.
Collapse
Affiliation(s)
- G R Szycik
- Clinic for Psychiatry, Social Psychiatry and Psychotherapy, Medical School Hannover, Germany.
| | | | | | | | | | | | | |
Collapse
|
14
|
Barutchu A, Crewther SG, Kiely P, Murphy MJ, Crewther DP. When /b/ill with /g/ill becomes /d/ill: Evidence for a lexical effect in audiovisual speech perception. ACTA ACUST UNITED AC 2008. [DOI: 10.1080/09541440601125623] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
15
|
Ross LA, Saint-Amour D, Leavitt VM, Molholm S, Javitt DC, Foxe JJ. Impaired multisensory processing in schizophrenia: deficits in the visual enhancement of speech comprehension under noisy environmental conditions. Schizophr Res 2007; 97:173-83. [PMID: 17928202 DOI: 10.1016/j.schres.2007.08.008] [Citation(s) in RCA: 142] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Revised: 08/07/2007] [Accepted: 08/10/2007] [Indexed: 12/01/2022]
Abstract
BACKGROUND Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. In this study we investigated the ability of patients with schizophrenia to integrate visual and auditory speech. Our objective was to determine to what extent they experience benefit from visual articulation and to detail under what listening conditions they might show the greatest impairments. METHODS We assessed the ability to recognize auditory and audiovisual speech in different levels of noise in 18 patients with schizophrenia and compared their performance with that of 18 healthy volunteers. We used a large set of monosyllabic words as our stimuli in order to more closely approximate performance in everyday situations. RESULTS Patients with schizophrenia showed deficits in their ability to derive benefit from visual articulatory motion. This impairment was most pronounced at signal-to-noise levels where multisensory gain is known to be maximal in healthy control subjects. A surprising finding was that despite known early auditory sensory processing deficits and reports of impairments in speech processing in schizophrenia, patients' performance in unisensory auditory speech perception remained fully intact. CONCLUSIONS Thus, the results showed a specific deficit in multisensory speech processing in the absence of any measurable deficit in unisensory speech processing and suggest that sensory integration dysfunction may be an important and, to date, rather overlooked aspect of schizophrenia.
Collapse
Affiliation(s)
- Lars A Ross
- Program in Cognitive Neuroscience, Department of Psychology, The City College of City University of New York, 138th St. and Convent Avenue, New York, New York 10031, USA
| | | | | | | | | | | |
Collapse
|
16
|
Tremblay C, Champoux F, Voss P, Bacon BA, Lepore F, Théoret H. Speech and non-speech audio-visual illusions: a developmental study. PLoS One 2007; 2:e742. [PMID: 17710142 PMCID: PMC1937019 DOI: 10.1371/journal.pone.0000742] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 07/16/2007] [Indexed: 11/19/2022] Open
Abstract
It is well known that simultaneous presentation of incongruent audio and visual stimuli can lead to illusory percepts. Recent data suggest that distinct processes underlie non-specific intersensory speech as opposed to non-speech perception. However, the development of both speech and non-speech intersensory perception across childhood and adolescence remains poorly defined. Thirty-eight observers aged 5 to 19 were tested on the McGurk effect (an audio-visual illusion involving speech), the Illusory Flash effect and the Fusion effect (two audio-visual illusions not involving speech) to investigate the development of audio-visual interactions and contrast speech vs. non-speech developmental patterns. Whereas the strength of audio-visual speech illusions varied as a direct function of maturational level, performance on non-speech illusory tasks appeared to be homogeneous across all ages. These data support the existence of independent maturational processes underlying speech and non-speech audio-visual illusory effects.
Collapse
Affiliation(s)
- Corinne Tremblay
- Department of Psychology, University of Montreal, Montreal, Canada
- Research Center, Sainte-Justine Hospital, Montreal, Canada
| | - François Champoux
- Speech Language Pathology and Audiology, University of Montreal, Montreal, Canada
| | - Patrice Voss
- Department of Psychology, University of Montreal, Montreal, Canada
| | - Benoit A. Bacon
- Department of Psychology, Bishop's University, Sherbrooke, Quebec, Canada
| | - Franco Lepore
- Department of Psychology, University of Montreal, Montreal, Canada
- Research Center, Sainte-Justine Hospital, Montreal, Canada
| | - Hugo Théoret
- Department of Psychology, University of Montreal, Montreal, Canada
- Research Center, Sainte-Justine Hospital, Montreal, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
17
|
Abstract
Although many audio-visual speech experiments have focused on situations where the presence of an incongruent visual speech signal influences the perceived utterance heard by an observer, there are also documented examples of a related effect in which the presence of an incongruent audio speech signal influences the perceived utterance seen by an observer. This study examined the effects that different distracting audio signals had on performance in a color and number keyword speechreading task. When the distracting sound was noise, time-reversed speech, or continuous speech, it had no effect on speechreading. However, when the distracting audio signal consisted of speech that started at the same time as the visual stimulus, speechreading performance was substantially degraded. This degradation did not depend on the semantic similarity between the target and masker speech, but it was substantially reduced when the onset of the audio speech was shifted relative to that of the visual stimulus. Overall, these results suggest that visual speech perception is impaired by the presence of a simultaneous mismatched audio speech signal, but that other types of audio distracters have little effect on speechreading performance.
Collapse
|
18
|
Abstract
OBJECTIVE The purpose of the present study was to examine the effects of age on the ability to benefit from combining auditory and visual speech information, relative to listening or speechreading alone. In addition, the study was designed to compare visual enhancement (VE) and auditory enhancement (AE) for consonants, words, and sentences in older and younger adults. DESIGN Forty-four older adults and 38 younger adults with clinically normal thresholds for frequencies of 4 kHz and below were asked to identify vowel-consonant-vowels (VCVs), words in a carrier phrase, and semantically meaningful sentences in auditory-only (A), visual-only (V), and auditory-visual (AV) conditions. All stimuli were presented in a background of 20-talker babble, and signal-to-babble ratios were set individually for each participant and each stimulus type to produce approximately 50% correct in the A condition. RESULTS For all three types of stimuli, older and younger adults obtained similar scores for the A condition, indicating that the procedure for individually adjusting signal-to-babble ratios was successful at equating A scores for the two age groups. Older adults, however, had significantly poorer performance than younger adults in the AV and V modalities. Analyses of both AE and VE indicated no age differences in the ability to benefit from combining auditory and visual speech signals after controlling for age differences in the V condition. Correlations between scores for the three types of stimuli (consonants, words, and sentences) indicated moderate correlations in the V condition but small correlations for AV, AE, and VE. CONCLUSIONS Overall, the findings suggest that the poorer performance of older adults in the AV condition was a result of reduced speechreading abilities rather than a consequence of impaired integration capacities. The pattern of correlations across the three stimulus types indicates some overlap in the mechanisms mediating AV perception of words and sentences and that these mechanisms are largely independent from those used for AV perception of consonants.
Collapse
Affiliation(s)
- Mitchell S Sommers
- Department of Psychology, Washington University, St. Louis, Missouri 63130, USA
| | | | | |
Collapse
|
19
|
Alsius A, Navarra J, Campbell R, Soto-Faraco S. Audiovisual Integration of Speech Falters under High Attention Demands. Curr Biol 2005; 15:839-43. [PMID: 15886102 DOI: 10.1016/j.cub.2005.03.046] [Citation(s) in RCA: 251] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2004] [Revised: 03/08/2005] [Accepted: 03/14/2005] [Indexed: 10/25/2022]
Abstract
One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.
Collapse
Affiliation(s)
- Agnès Alsius
- Cognitive Neuroscience Group, Parc Científic de Barcelona, Departament de Psicologia Bàsica, Universitat de Barcelona, Passeig de la Vall d'Hebron, 171, 08035 Barcelona, Spain
| | | | | | | |
Collapse
|
20
|
Abstract
Phoneme identification with audiovisually discrepant stimuli is influenced hy information in the visual signal (the McGurk effect). Additionally, lexical status affects identification of auditorily presented phonemes. The present study tested for lexical influences on the McGurk effect. Participants identified phonemes in audiovisually discrepant stimuli in which lexical status of the auditory component and of a visually influenced percept was independently varied. Visually influenced (McGurk) responses were more frequent when they formed a word and when the auditory signal was a nonword (Experiment 1). Lexical effects were larger for slow than for fast responses (Experiment 2), as with auditory speech, and were replicated with stimuli matched on physical properties (Experiment 3). These results are consistent with models in which lexical processing of speech is modality independent.
Collapse
Affiliation(s)
- Lawrence Brancazio
- Department of Psychology, Southern Connecticut State University, New Haven, CT, USA.
| | | |
Collapse
|
21
|
Abstract
OBJECTIVE This experiment was designed to assess the integration of auditory and visual information for speech perception in older adults. The integration of place and voicing information was assessed across modalities using the McGurk effect. The following questions were addressed: 1) Are older adults as successful as younger adults at integrating auditory and visual information for speech perception? 2) Is successful integration of this information related to lipreading performance? DESIGN The performance of three groups of participants was compared: young adults with normal hearing and vision, older adults with normal to near-normal hearing and vision, and young controls, whose hearing thresholds were shifted with noise to match the older adults. Each participant completed a lipreading test and auditory and auditory-plus-visual identification of syllables with conflicting auditory and visual cues. RESULTS The results show that on average older adults are as successful as young adults at integrating auditory and visual information for speech perception at the syllable level. The number of fused responses did not differ for the CV tokens across the ages tested. Although there were no significant differences between groups for integration at the syllable level, there were differences in the response alternatives chosen. Young adults with normal peripheral sensitivity often chose an auditory alternative whereas, older adults and control participants leaned toward visual alternatives. In additions, older adults demonstrated poorer lipreading performance than their younger counterparts. This was not related to successful integration of information at the syllable level. CONCLUSIONS Based on the findings of this study, when auditory and visual integration of speech information fails to occur, producing a nonfused response, participants select an alternative response from the modality with the least ambiguous signal.
Collapse
|
22
|
Campbell T, Beaman CP, Berry DC. Changing-state disruption of lip-reading by irrelevant sound in perceptual and memory tasks. ACTA ACUST UNITED AC 2002. [DOI: 10.1080/09541440143000168] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
23
|
Abstract
In this study of visual phonetic speech perception without accompanying auditory speech stimuli, adults with normal hearing (NH; n = 96) and with severely to profoundly impaired hearing (IH; n = 72) identified consonant-vowel (CV) nonsense syllables and words in isolation and in sentences. The measures of phonetic perception were the proportion of phonemes correct and the proportion of transmitted feature information for CVs, the proportion of phonemes correct for words, and the proportion of phonemes correct and the amount of phoneme substitution entropy for sentences. The results demonstrated greater sensitivity to phonetic information in the IH group. Transmitted feature information was related to isolated word scores for the IH group, but not for the NH group. Phoneme errors in sentences were more systematic in the IH than in the NH group. Individual differences in phonetic perception for CVs were more highly associated with word and sentence performance for the IH than for the NH group. The results suggest that the necessity to perceive speech without hearing can be associated with enhanced visual phonetic perception in some individuals.
Collapse
Affiliation(s)
- L E Bernstein
- Department of Communication Neuroscience, House Ear Institute, Los Angeles, California 90057, USA.
| | | | | |
Collapse
|
24
|
Abstract
The "McGurk effect" demonstrates that visual (lip-read) information is used during speech perception even when it is discrepant with auditory information. While this has been established as a robust effect in subjects from Western cultures, our own earlier results had suggested that Japanese subjects use visual information much less than American subjects do (Sekiyama & Tohkura, 1993). The present study examined whether Chinese subjects would also show a reduced McGurk effect due to their cultural similarities with the Japanese. The subjects were 14 native speakers of Chinese living in Japan. Stimuli consisted of 10 syllable (/ba/, /pa/, /ma/, /wa/, /da/, /ta/, /na/, /ga/, /ka/, /ra/) pronounced by two speakers, one Japanese and one American. Each auditory syllable was dubbed onto every, visual syllable within one speaker, resulting in 100 audiovisual stimuli in each language. The subjects' main task was to report what they thought they had heard while looking at and listening to the speaker while the stimuli were being uttered. Compared with previous results obtained with American subjects, the Chinese subjects showed a weaker McGurk effect. The results also showed that the magnitude of the McGurk effect depends on the length of time the Chinese subjects had lived in Japan. Factors that foster and alter the Chinese subjects' reliance on auditory information are discussed.
Collapse
|
25
|
Hippel WV, Sekaquaptewa D, Vargas P. On The Role Of Encoding Processes In Stereotype Maintenance. Advances in Experimental Social Psychology 1995. [DOI: 10.1016/s0065-2601(08)60406-2] [Citation(s) in RCA: 72] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
26
|
Baynes K, Funnell MG, Fowler CA. Hemispheric contributions to the integration of visual and auditory information in speech perception. Percept Psychophys 1994; 55:633-41. [PMID: 8058451 DOI: 10.3758/bf03211678] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Differential hemispheric contributions to the perceptual phenomenon known as the McGurk effect were examined in normal subjects, 1 callosotomy patient, and 4 patients with intractable epilepsy. Twenty-five right-handed subjects were more likely to demonstrate an influence of a mouthed word on identification of a dubbed acoustic word when the speaker's face was lateralized to the LVF as compared with the RVF. In contrast, display of printed response alternatives in the RVF elicited a greater percentage of McGurk responses than display in the LVF. Visual field differences were absent in a group of 15 left-handed subjects. These results suggest that in right-handers, the two hemispheres may make distinct contributions to the McGurk effect. The callosotomy patient demonstrated reliable McGurk effects, but at a lower rate than the normal subjects and the epileptic control subjects. These data support the view that both the right and left hemisphere can make significant contributions to the McGurk effect.
Collapse
Affiliation(s)
- K Baynes
- Center for Neuroscience, University of California, Davis 95616
| | | | | |
Collapse
|