1
|
Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023; 13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one's own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGurk stimulus A/pa/V/ka/. The McGurk effect was stronger with Japanese stimuli in both groups. Differences in speech perception were prominent between individual speakers but less so between native languages. Unisensory perception correlated with McGurk perception. These findings suggest that stimulus-dependent features contribute to the McGurk effect. This may have a stronger influence on syllable perception than cross-language factors.
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Yuta Ujiie
- Department of Psychology, College of Contemporary Psychology, Rikkyo University, Saitama 352-8558, Japan
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Osaka 567-8570, Japan
| | - Tarja Peromaa
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Kohske Takahashi
- College of Comprehensive Psychology, Ritsumeikan University, Osaka 567-8570, Japan
| |
Collapse
|
2
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
3
|
Ullas S, Formisano E, Eisner F, Cutler A. Audiovisual and lexical cues do not additively enhance perceptual adaptation. Psychon Bull Rev 2020; 27:707-715. [PMID: 32319002 PMCID: PMC7398951 DOI: 10.3758/s13423-020-01728-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
When listeners experience difficulty in understanding a speaker, lexical and audiovisual (or lipreading) information can be a helpful source of guidance. These two types of information embedded in speech can also guide perceptual adjustment, also known as recalibration or perceptual retuning. With retuning or recalibration, listeners can use these contextual cues to temporarily or permanently reconfigure internal representations of phoneme categories to adjust to and understand novel interlocutors more easily. These two types of perceptual learning, previously investigated in large part separately, are highly similar in allowing listeners to use speech-external information to make phoneme boundary adjustments. This study explored whether the two sources may work in conjunction to induce adaptation, thus emulating real life, in which listeners are indeed likely to encounter both types of cue together. Listeners who received combined audiovisual and lexical cues showed perceptual learning effects similar to listeners who only received audiovisual cues, while listeners who received only lexical cues showed weaker effects compared with the two other groups. The combination of cues did not lead to additive retuning or recalibration effects, suggesting that lexical and audiovisual cues operate differently with regard to how listeners use them for reshaping perceptual categories. Reaction times did not significantly differ across the three conditions, so none of the forms of adjustment were either aided or hindered by processing time differences. Mechanisms underlying these forms of perceptual learning may diverge in numerous ways despite similarities in experimental applications.
Collapse
Affiliation(s)
- Shruti Ullas
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands.
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
| | - Frank Eisner
- Donders Centre for Cognition, Radboud University Nijmegen, 6500 AH, Nijmegen, The Netherlands
| | - Anne Cutler
- MARCS Institute and ARC Centre of Excellence for the Dynamics of Language, Western Sydney University, Penrith, NSW, 2751, Australia
| |
Collapse
|
4
|
Devaraju DS, U AK, Maruthy S. Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada. J Audiol Otol 2019; 23:39-43. [PMID: 30518196 PMCID: PMC6348306 DOI: 10.7874/jao.2018.00234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 06/16/2018] [Accepted: 07/16/2018] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND AND OBJECTIVES The influence of visual stimulus on the auditory component in the perception of auditory-visual (AV) consonant-vowel syllables has been demonstrated in different languages. Inherent properties of unimodal stimuli are known to modulate AV integration. The present study investigated how the amount of McGurk effect (an outcome of AV integration) varies across three different consonant combinations in Kannada language. The importance of unimodal syllable identification on the amount of McGurk effect was also seen. Subjects and. METHODS Twenty-eight individuals performed an AV identification task with ba/ ga, pa/ka and ma/n· a consonant combinations in AV congruent, AV incongruent (McGurk combination), audio alone and visual alone condition. Cluster analysis was performed using the identification scores for the incongruent stimuli, to classify the individuals into two groups; one with high and the other with low McGurk scores. The differences in the audio alone and visual alone scores between these groups were compared. RESULTS The results showed significantly higher McGurk scores for ma/n· a compared to ba/ga and pa/ka combinations in both high and low McGurk score groups. No significant difference was noted between ba/ga and pa/ka combinations in either group. Identification of /n· a/ presented in the visual alone condition correlated negatively with the higher McGurk scores. CONCLUSIONS The results suggest that the final percept following the AV integration is not exclusively explained by the unimodal identification of the syllables. But there are other factors which may also contribute to making inferences about the final percept.
Collapse
Affiliation(s)
- Dhatri S Devaraju
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Ajith Kumar U
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Santosh Maruthy
- Department of Speech-Language Sciences, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| |
Collapse
|
5
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
6
|
Affiliation(s)
- Kaisa Tiippana
- Division of Cognitive Psychology and Neuropsychology, Institute of Behavioural Sciences, University of Helsinki Helsinki, Finland
| |
Collapse
|
7
|
Sanchez K, Miller RM, Rosenblum LD. Visual influences on alignment to voice onset time. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2010; 53:262-72. [PMID: 20220027 DOI: 10.1044/1092-4388(2009/08-0247)] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE Speech shadowing experiments were conducted to test whether alignment (inadvertent imitation) to voice onset time (VOT) can be influenced by visual speech information. METHOD Experiment 1 examined whether alignment would occur to auditory /pa/ syllables manipulated to have 3 different VOTs. Nineteen female participants were asked to listen to 180 syllables over headphones and to say each syllable out loud quickly and clearly. In Experiment 2, visual speech tokens composed of a face articulating /pa/ syllables at 2 different rates were dubbed onto the audio /pa/ syllables of Experiment 1. Sixteen new female participants were asked to listen to and watch (over a video monitor) 180 syllables and to say each syllable out loud quickly and clearly. RESULTS Results of Experiment 1 showed that the 3 VOTs of the audio /pa/ stimuli influenced the VOTs of the participants' produced syllables. Results of Experiment 2 revealed that both the visible syllable rate and audio VOT of the audiovisual /pa/ stimuli influenced the VOTs of the participants' produced syllables. CONCLUSION These results show that, like auditory speech, visual speech information can induce speech alignment to a phonetically relevant property of an utterance.
Collapse
Affiliation(s)
- Kauyumari Sanchez
- University of California, 900 University Avenue, Riverside, CA 92521, USA
| | | | | |
Collapse
|
8
|
Shahin AJ, Miller LM. Multisensory integration enhances phonemic restoration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1744-50. [PMID: 19275331 PMCID: PMC2663900 DOI: 10.1121/1.3075576] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Phonemic restoration occurs when speech is perceived to be continuous through noisy interruptions, even when the speech signal is artificially removed from the interrupted epochs. This temporal filling-in illusion helps maintain robust comprehension in adverse environments and illustrates how contextual knowledge through the auditory modality (e.g., lexical) can improve perception. This study investigated how one important form of context, visual speech, affects phonemic restoration. The hypothesis was that audio-visual integration of speech should improve phonemic restoration, allowing the perceived continuity to span longer temporal gaps. Subjects listened to tri-syllabic words with a portion of each word replaced by white noise while watching lip-movement that was either congruent, temporally reversed (incongruent), or static. For each word, subjects judged whether the utterance sounded continuous or interrupted, where a "continuous" response indicated an illusory percept. Results showed that illusory filling-in of longer white noise durations (longer missing segments) occurred when the mouth movement was congruent with the spoken word compared to the other conditions, with no differences occurring between the static and incongruent conditions. Thus, phonemic restoration is enhanced when applying contextual knowledge through multisensory integration.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind & Brain, University of California, Davis, California 95618, USA.
| | | |
Collapse
|
9
|
Brancazio L, Miller JL. Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect. ACTA ACUST UNITED AC 2005; 67:759-69. [PMID: 16334050 DOI: 10.3758/bf03193531] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect, where an incongruent visual syllable influences identification of an auditory syllable, does not always occur, suggesting that perceivers sometimes fail to use relevant visual phonetic information. We tested whether another visual phonetic effect, which involves the influence of visual speaking rate on perceived voicing (Green & Miller, 1985), would occur in instances when the McGurk effect does not. In Experiment 1, we established this visual rate effect using auditory and visual stimuli matching in place of articulation, finding a shift in the voicing boundary along an auditory voice-onset-time continuum with fast versus slow visual speech tokens. In Experiment 2, we used auditory and visual stimuli differing in place of articulation and found a shift in the voicing boundary due to visual rate when the McGurk effect occurred and, more critically, when it did not. The latter finding indicates that phonetically relevant visual information is used in speech perception even when the McGurk effect does not occur, suggesting that the incidence of the McGurk effect underestimates the extent of audio-visual integration.
Collapse
Affiliation(s)
- Lawrence Brancazio
- Department of Psychology, Southern Connecticut State University, 501 Crescent St., New Haven, CT 06515, USA.
| | | |
Collapse
|