1
|
Dorsi J, Ostrand R, Rosenblum LD. Semantic priming from McGurk words: Priming depends on perception. Atten Percept Psychophys 2023; 85:1219-1237. [PMID: 37155085 DOI: 10.3758/s13414-023-02689-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2023] [Indexed: 05/10/2023]
Abstract
The McGurk effect is an illusion in which visible articulations alter the perception of auditory speech (e.g., video 'da' dubbed with audio 'ba' may be heard as 'da'). To test the timing of the multisensory processes that underlie the McGurk effect, Ostrand et al. Cognition 151, 96-107, 2016 used incongruent stimuli, such as auditory 'bait' + visual 'date' as primes in a lexical decision task. These authors reported that the auditory word, but not the perceived (visual) word, induced semantic priming, suggesting that the auditory signal alone can provide the input for lexical access, before multisensory integration is complete. Here, we conceptually replicate the design of Ostrand et al. (2016), using different stimuli chosen to optimize the success of the McGurk illusion. In contrast to the results of Ostrand et al. (2016), we find that the perceived (i.e., visual) word of the incongruent stimulus usually induced semantic priming. We further find that the strength of this priming corresponded to the magnitude of the McGurk effect for each word combination. These findings suggest, in contrast to the findings of Ostrand et al. (2016), that lexical access makes use of integrated multisensory information which is perceived by the listener. These findings further suggest that which unimodal signal of a multisensory stimulus is used in lexical access is dependent on the perception of that stimulus.
Collapse
Affiliation(s)
- Josh Dorsi
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA.
- Penn State University, College of Medicine, State College, PA, USA.
| | | | - Lawrence D Rosenblum
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA
| |
Collapse
|
2
|
Iqbal ZJ, Shahin AJ, Bortfeld H, Backer KC. The McGurk Illusion: A Default Mechanism of the Auditory System. Brain Sci 2023; 13:brainsci13030510. [PMID: 36979322 PMCID: PMC10046462 DOI: 10.3390/brainsci13030510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/10/2023] [Accepted: 03/18/2023] [Indexed: 03/22/2023] Open
Abstract
Recent studies have questioned past conclusions regarding the mechanisms of the McGurk illusion, especially how McGurk susceptibility might inform our understanding of audiovisual (AV) integration. We previously proposed that the McGurk illusion is likely attributable to a default mechanism, whereby either the visual system, auditory system, or both default to specific phonemes—those implicated in the McGurk illusion. We hypothesized that the default mechanism occurs because visual stimuli with an indiscernible place of articulation (like those traditionally used in the McGurk illusion) lead to an ambiguous perceptual environment and thus a failure in AV integration. In the current study, we tested the default hypothesis as it pertains to the auditory system. Participants performed two tasks. One task was a typical McGurk illusion task, in which individuals listened to auditory-/ba/ paired with visual-/ga/ and judged what they heard. The second task was an auditory-only task, in which individuals transcribed trisyllabic words with a phoneme replaced by silence. We found that individuals’ transcription of missing phonemes often defaulted to ‘/d/t/th/’, the same phonemes often experienced during the McGurk illusion. Importantly, individuals’ default rate was positively correlated with their McGurk rate. We conclude that the McGurk illusion arises when people fail to integrate visual percepts with auditory percepts, due to visual ambiguity, thus leading the auditory system to default to phonemes often implicated in the McGurk illusion.
Collapse
Affiliation(s)
- Zunaira J. Iqbal
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
| | - Heather Bortfeld
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Department of Psychological Sciences, University of California, Merced, CA 95353, USA
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Correspondence:
| |
Collapse
|
3
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
4
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
5
|
Thézé R, Gadiri MA, Albert L, Provost A, Giraud AL, Mégevand P. Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments. Sci Rep 2020; 10:15540. [PMID: 32968127 PMCID: PMC7511320 DOI: 10.1038/s41598-020-72375-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/31/2020] [Indexed: 11/09/2022] Open
Abstract
Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Mehdi Ali Gadiri
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Louis Albert
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Antoine Provost
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland. .,Division of Neurology, Geneva University Hospitals, Geneva, Switzerland.
| |
Collapse
|
6
|
"Paying" attention to audiovisual speech: Do incongruent stimuli incur greater costs? Atten Percept Psychophys 2019; 81:1743-1756. [PMID: 31197661 DOI: 10.3758/s13414-019-01772-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing - susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.
Collapse
|
7
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
8
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
9
|
Danielson DK, Bruderer AG, Kandhadai P, Vatikiotis-Bateson E, Werker JF. The organization and reorganization of audiovisual speech perception in the first year of life. COGNITIVE DEVELOPMENT 2017; 42:37-48. [PMID: 28970650 PMCID: PMC5621752 DOI: 10.1016/j.cogdev.2017.02.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.
Collapse
Affiliation(s)
- D. Kyle Danielson
- Department of Psychology, The University of British Columbia, 2136
West Mall, Vancouver BC V6T 1Z4, Canada
| | - Alison G. Bruderer
- School of Audiology and Speech Sciences, The University of British
Columbia, 2177 Wesbrook Mall, Vancouver BC V6T 1Z3, Canada
| | - Padmapriya Kandhadai
- Department of Psychology, The University of British Columbia, 2136
West Mall, Vancouver BC V6T 1Z4, Canada
| | - Eric Vatikiotis-Bateson
- Department of Linguistics, The University of British Columbia, 2613
West Mall, Vancouver BC V6T 1Z4, Canada
| | - Janet F. Werker
- Department of Psychology, The University of British Columbia, 2136
West Mall, Vancouver BC V6T 1Z4, Canada
| |
Collapse
|
10
|
Jerger S, Damian MF, Tye-Murray N, Abdi H. Children perceive speech onsets by ear and eye. JOURNAL OF CHILD LANGUAGE 2017; 44:185-215. [PMID: 26752548 PMCID: PMC4940343 DOI: 10.1017/s030500091500077x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Adults use vision to perceive low-fidelity speech; yet how children acquire this ability is not well understood. The literature indicates that children show reduced sensitivity to visual speech from kindergarten to adolescence. We hypothesized that this pattern reflects the effects of complex tasks and a growth period with harder-to-utilize cognitive resources, not lack of sensitivity. We investigated sensitivity to visual speech in children via the phonological priming produced by low-fidelity (non-intact onset) auditory speech presented audiovisually (see dynamic face articulate consonant/rhyme b/ag; hear non-intact onset/rhyme: -b/ag) vs. auditorily (see still face; hear exactly same auditory input). Audiovisual speech produced greater priming from four to fourteen years, indicating that visual speech filled in the non-intact auditory onsets. The influence of visual speech depended uniquely on phonology and speechreading. Children - like adults - perceive speech onsets multimodally. Findings are critical for incorporating visual speech into developmental theories of speech perception.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences,GR4·1,University of Texas at Dallas, andCallier Center for Communication Disorders,Richardson,Texas
| | | | - Nancy Tye-Murray
- Department of Otolaryngology-Head and Neck Surgery,Washington University School of Medicine
| | - Hervé Abdi
- School of Behavioral and Brain Sciences,GR4·1,University of Texas at Dallas
| |
Collapse
|
11
|
McCotter MV, Jordan TR. The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception. Perception 2016; 32:921-36. [PMID: 14580139 DOI: 10.1068/p3316] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.
Collapse
Affiliation(s)
- Maxine V McCotter
- School of Psychology, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
| | | |
Collapse
|
12
|
Dias JW, Cook TC, Rosenblum LD. Influences of selective adaptation on perception of audiovisual speech. JOURNAL OF PHONETICS 2016; 56:75-84. [PMID: 27041781 PMCID: PMC4815035 DOI: 10.1016/j.wocn.2016.02.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers "heard" the audio-visual stimulus as an integrated "va" percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual "va" percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation.
Collapse
|
13
|
|
14
|
Benoit MM, Raij T, Lin FH, Jääskeläinen IP, Stufflebeam S. Primary and multisensory cortical activity is correlated with audiovisual percepts. Hum Brain Mapp 2010; 31:526-38. [PMID: 19780040 DOI: 10.1002/hbm.20884] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Incongruent auditory and visual stimuli can elicit audiovisual illusions such as the McGurk effect where visual /ka/ and auditory /pa/ fuse into another percept such as/ta/. In the present study, human brain activity was measured with adaptation functional magnetic resonance imaging to investigate which brain areas support such audiovisual illusions. Subjects viewed trains of four movies beginning with three congruent /pa/ stimuli to induce adaptation. The fourth stimulus could be (i) another congruent /pa/, (ii) a congruent /ka/, (iii) an incongruent stimulus that evokes the McGurk effect in susceptible individuals (lips /ka/ voice /pa/), or (iv) the converse combination that does not cause the McGurk effect (lips /pa/ voice/ ka/). This paradigm was predicted to show increased release from adaptation (i.e. stronger brain activation) when the fourth movie and the related percept was increasingly different from the three previous movies. A stimulus change in either the auditory or the visual stimulus from /pa/ to /ka/ (iii, iv) produced within-modality and cross-modal responses in primary auditory and visual areas. A greater release from adaptation was observed for incongruent non-McGurk (iv) compared to incongruent McGurk (iii) trials. A network including the primary auditory and visual cortices, nonprimary auditory cortex, and several multisensory areas (superior temporal sulcus, intraparietal sulcus, insula, and pre-central cortex) showed a correlation between perceiving the McGurk effect and the fMRI signal, suggesting that these areas support the audiovisual illusion.
Collapse
Affiliation(s)
- Margo McKenna Benoit
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA 02114, USA.
| | | | | | | | | |
Collapse
|
15
|
Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli. ACTA ACUST UNITED AC 2007; 69:744-56. [PMID: 17929697 DOI: 10.3758/bf03193776] [Citation(s) in RCA: 200] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Saint-Amour D, De Sanctis P, Molholm S, Ritter W, Foxe JJ. Seeing voices: High-density electrical mapping and source-analysis of the multisensory mismatch negativity evoked during the McGurk illusion. Neuropsychologia 2006; 45:587-97. [PMID: 16757004 PMCID: PMC1705816 DOI: 10.1016/j.neuropsychologia.2006.03.036] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2005] [Revised: 03/23/2006] [Accepted: 03/31/2006] [Indexed: 11/17/2022]
Abstract
Seeing a speaker's facial articulatory gestures powerfully affects speech perception, helping us overcome noisy acoustical environments. One particularly dramatic illustration of visual influences on speech perception is the "McGurk illusion", where dubbing an auditory phoneme onto video of an incongruent articulatory movement can often lead to illusory auditory percepts. This illusion is so strong that even in the absence of any real change in auditory stimulation, it activates the automatic auditory change-detection system, as indexed by the mismatch negativity (MMN) component of the auditory event-related potential (ERP). We investigated the putative left hemispheric dominance of McGurk-MMN using high-density ERPs in an oddball paradigm. Topographic mapping of the initial McGurk-MMN response showed a highly lateralized left hemisphere distribution, beginning at 175 ms. Subsequently, scalp activity was also observed over bilateral fronto-central scalp with a maximal amplitude at approximately 290 ms, suggesting later recruitment of right temporal cortices. Strong left hemisphere dominance was again observed during the last phase of the McGurk-MMN waveform (350-400 ms). Source analysis indicated bilateral sources in the temporal lobe just posterior to primary auditory cortex. While a single source in the right superior temporal gyrus (STG) accounted for the right hemisphere activity, two separate sources were required, one in the left transverse gyrus and the other in STG, to account for left hemisphere activity. These findings support the notion that visually driven multisensory illusory phonetic percepts produce an auditory-MMN cortical response and that left hemisphere temporal cortex plays a crucial role in this process.
Collapse
Affiliation(s)
- Dave Saint-Amour
- Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Program in Cognitive Neuroscience and Schizophrenia, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA
| | - Pierfilippo De Sanctis
- Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Program in Cognitive Neuroscience and Schizophrenia, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA
| | - Sophie Molholm
- Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Program in Cognitive Neuroscience and Schizophrenia, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA
- Program in Cognitive Neuroscience, Department of Psychology, The City College of the City University of New York, 138th Street & Convent Avenue, New York, NY 10031, USA
| | - Walter Ritter
- Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Program in Cognitive Neuroscience and Schizophrenia, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA
- Program in Cognitive Neuroscience, Department of Psychology, The City College of the City University of New York, 138th Street & Convent Avenue, New York, NY 10031, USA
| | - John J. Foxe
- Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Program in Cognitive Neuroscience and Schizophrenia, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA
- Program in Cognitive Neuroscience, Department of Psychology, The City College of the City University of New York, 138th Street & Convent Avenue, New York, NY 10031, USA
- * Corresponding author. Tel.: +1 845 398 6547; fax: +1 845 398 654. E-mail address: (J.J. Foxe)
| |
Collapse
|
17
|
Brancazio L, Best CT, Fowler CA. Visual influences on perception of speech and nonspeech vocal-tract events. LANGUAGE AND SPEECH 2006; 49:21-53. [PMID: 16922061 PMCID: PMC2773261 DOI: 10.1177/00238309060490010301] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
We report four experiments designed to determine whether visual information affects judgments of acoustically-specified nonspeech events as well as speech events (the "McGurk effect"). Previous findings have shown only weak McGurk effects for nonspeech stimuli, whereas strong effects are found for consonants. We used click sounds that serve as consonants in some African languages, but that are perceived as nonspeech by American English listeners. We found a significant McGurk effect for clicks presented in isolation that was much smaller than that found for stop-consonant-vowel syllables. In subsequent experiments, we found strong McGurk effects, comparable to those found for English syllables, for click-vowel syllables, and weak effects, comparable to those found for isolated clicks, for excised release bursts of stop consonants presented in isolation. We interpret these findings as evidence that the potential contributions of speech-specific processes on the McGurk effect are limited, and discuss the results in relation to current explanations for the McGurk effect.
Collapse
Affiliation(s)
- Lawrence Brancazio
- Department of Psychology, Southern Connecticut State University, New Haven 06515, USA.
| | | | | |
Collapse
|
18
|
Abstract
Phoneme identification with audiovisually discrepant stimuli is influenced hy information in the visual signal (the McGurk effect). Additionally, lexical status affects identification of auditorily presented phonemes. The present study tested for lexical influences on the McGurk effect. Participants identified phonemes in audiovisually discrepant stimuli in which lexical status of the auditory component and of a visually influenced percept was independently varied. Visually influenced (McGurk) responses were more frequent when they formed a word and when the auditory signal was a nonword (Experiment 1). Lexical effects were larger for slow than for fast responses (Experiment 2), as with auditory speech, and were replicated with stimuli matched on physical properties (Experiment 3). These results are consistent with models in which lexical processing of speech is modality independent.
Collapse
Affiliation(s)
- Lawrence Brancazio
- Department of Psychology, Southern Connecticut State University, New Haven, CT, USA.
| | | |
Collapse
|
19
|
Thomas SM, Jordan TR. Determining the influence of Gaussian blurring on inversion effects with talking faces. ACTA ACUST UNITED AC 2002; 64:932-44. [PMID: 12269300 DOI: 10.3758/bf03196797] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.
Collapse
|
20
|
Jordan TR, Thomas SM. Effects of horizontal viewing angle on visual and audiovisual speech recognition. ACTA ACUST UNITED AC 2001. [DOI: 10.1037/0096-1523.27.6.1386] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
21
|
Jordan TR, McCotter MV, Thomas SM. Visual and audiovisual speech perception with color and gray-scale facial images. PERCEPTION & PSYCHOPHYSICS 2000; 62:1394-404. [PMID: 11143451 DOI: 10.3758/bf03212141] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Research has shown that auditory speech recognition is influenced by the appearance of a talker's face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.
Collapse
Affiliation(s)
- T R Jordan
- School of Psychology, University of Nottingham, University Park, Nottingham, NG7 2RD, England.
| | | | | |
Collapse
|
22
|
Rosenblum LD, Schmuckler MA, Johnson JA. The McGurk effect in infants. PERCEPTION & PSYCHOPHYSICS 1997; 59:347-57. [PMID: 9136265 DOI: 10.3758/bf03211902] [Citation(s) in RCA: 252] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the McGurk effect, perceptual identification of auditory speech syllables is influenced by simultaneous presentation of discrepant visible speech syllables. This effect has been found in subjects of different ages and with various native language backgrounds. But no McGurk tests have been conducted with prelinguistic infants. In the present series of experiments, 5-month-old English-exposed infants were tested for the McGurk effect. Infants were first gaze-habituated to an audiovisual /va/. Two different dishabituation stimuli were then presented: audio /ba/-visual /va/ (perceived by adults as /va/), and audio /da/-visual /va/ (perceived by adults as /da/). The infants showed generalization from the audiovisual /va/ to the audio /ba/-visual /va/ stimulus but not to the audio /da/-visual /va/ stimulus. Follow-up experiments revealed that these generalization differences were not due to a general preference for the audio /da/-visual /va/ stimulus or to the auditory similarity of /ba/ to /va/ relative to /da/. These results suggest that the infants were visually influenced in the same way as English-speaking adults are visually influenced.
Collapse
Affiliation(s)
- L D Rosenblum
- Department of Psychology, University of California, Riverside 92521, USA.
| | | | | |
Collapse
|
23
|
Sekiyama K. Cultural and linguistic factors in audiovisual speech processing: the McGurk effect in Chinese subjects. PERCEPTION & PSYCHOPHYSICS 1997; 59:73-80. [PMID: 9038409 DOI: 10.3758/bf03206849] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The "McGurk effect" demonstrates that visual (lip-read) information is used during speech perception even when it is discrepant with auditory information. While this has been established as a robust effect in subjects from Western cultures, our own earlier results had suggested that Japanese subjects use visual information much less than American subjects do (Sekiyama & Tohkura, 1993). The present study examined whether Chinese subjects would also show a reduced McGurk effect due to their cultural similarities with the Japanese. The subjects were 14 native speakers of Chinese living in Japan. Stimuli consisted of 10 syllable (/ba/, /pa/, /ma/, /wa/, /da/, /ta/, /na/, /ga/, /ka/, /ra/) pronounced by two speakers, one Japanese and one American. Each auditory syllable was dubbed onto every, visual syllable within one speaker, resulting in 100 audiovisual stimuli in each language. The subjects' main task was to report what they thought they had heard while looking at and listening to the speaker while the stimuli were being uttered. Compared with previous results obtained with American subjects, the Chinese subjects showed a weaker McGurk effect. The results also showed that the magnitude of the McGurk effect depends on the length of time the Chinese subjects had lived in Japan. Factors that foster and alter the Chinese subjects' reliance on auditory information are discussed.
Collapse
|
24
|
Saldaña HM, Rosenblum LD. Visual influences on auditory pluck and bow judgments. PERCEPTION & PSYCHOPHYSICS 1993; 54:406-16. [PMID: 8414899 DOI: 10.3758/bf03205276] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
In the McGurk effect, visual information specifying a speaker's articulatory movements can influence auditory judgments of speech. In the present study, we attempted to find an analogue of the McGurk effect by using nonspeech stimuli--the discrepant audiovisual tokens of plucks and bows on a cello. The results of an initial experiment revealed that subjects' auditory judgments were influenced significantly by the visual pluck and bow stimuli. However, a second experiment in which speech syllables were used demonstrated that the visual influence on consonants was significantly greater than the visual influence observed for pluck-bow stimuli. This result could be interpreted to suggest that the nonspeech visual influence was not a true McGurk effect. In a third experiment, visual stimuli consisting of the words pluck and bow were found to have no influence over auditory pluck and bow judgments. This result could suggest that the nonspeech effects found in Experiment 1 were based on the audio and visual information's having an ostensive lawful relation to the specified event. These results are discussed in terms of motor-theory, ecological, and FLMP approaches to speech perception.
Collapse
|