1
|
Syed I, Baart M, Vroomen J. The Multimodal Trust Effects of Face, Voice, and Sentence Content. Multisens Res 2024; 37:125-141. [PMID: 38714314 DOI: 10.1163/22134808-bja10119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 03/15/2024] [Indexed: 05/09/2024]
Abstract
Trust is an aspect critical to human social interaction and research has identified many cues that help in the assimilation of this social trait. Two of these cues are the pitch of the voice and the width-to-height ratio of the face (fWHR). Additionally, research has indicated that the content of a spoken sentence itself has an effect on trustworthiness; a finding that has not yet been brought into multisensory research. The current research aims to investigate previously developed theories on trust in relation to vocal pitch, fWHR, and sentence content in a multimodal setting. Twenty-six female participants were asked to judge the trustworthiness of a voice speaking a neutral or romantic sentence while seeing a face. The average pitch of the voice and the fWHR were varied systematically. Results indicate that the content of the spoken message was an important predictor of trustworthiness extending into multimodality. Further, the mean pitch of the voice and fWHR of the face appeared to be useful indicators in a multimodal setting. These effects interacted with one another across modalities. The data demonstrate that trust in the voice is shaped by task-irrelevant visual stimuli. Future research is encouraged to clarify whether these findings remain consistent across genders, age groups, and languages.
Collapse
Affiliation(s)
- Isar Syed
- Department of Cognitive Neuropsychology, 578795Tilburg University, 5000 LE Tilburg, the Netherlands
| | - Martijn Baart
- Department of Cognitive Neuropsychology, 578795Tilburg University, 5000 LE Tilburg, the Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, 578795Tilburg University, 5000 LE Tilburg, the Netherlands
| |
Collapse
|
2
|
Pourhashemi F, Baart M, van Laarhoven T, Vroomen J. Want to quickly adapt to distorted speech and become a better listener? Read lips, not text. PLoS One 2022; 17:e0278986. [PMID: 36580461 PMCID: PMC9799298 DOI: 10.1371/journal.pone.0278986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/28/2022] [Indexed: 12/30/2022] Open
Abstract
When listening to distorted speech, does one become a better listener by looking at the face of the speaker or by reading subtitles that are presented along with the speech signal? We examined this question in two experiments in which we presented participants with spectrally distorted speech (4-channel noise-vocoded speech). During short training sessions, listeners received auditorily distorted words or pseudowords that were partially disambiguated by concurrently presented lipread information or text. After each training session, listeners were tested with new degraded auditory words. Learning effects (based on proportions of correctly identified words) were stronger if listeners had trained with words rather than with pseudowords (a lexical boost), and adding lipread information during training was more effective than adding text (a lipread boost). Moreover, the advantage of lipread speech over text training was also found when participants were tested more than a month later. The current results thus suggest that lipread speech may have surprisingly long-lasting effects on adaptation to distorted speech.
Collapse
Affiliation(s)
- Faezeh Pourhashemi
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Martijn Baart
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
- BCBL, Basque Center on Cognition, Brain, and Language, Donostia, Spain
- * E-mail:
| | - Thijs van Laarhoven
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
3
|
López Zunini RA, Baart M, Samuel AG, Armstrong BC. Lexico-semantic access and audiovisual integration in the aging brain: Insights from mixed-effects regression analyses of event-related potentials. Neuropsychologia 2021; 165:108107. [PMID: 34921819 DOI: 10.1016/j.neuropsychologia.2021.108107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 11/17/2021] [Accepted: 11/29/2021] [Indexed: 10/19/2022]
Abstract
We investigated how aging modulates lexico-semantic processes in the visual (seeing written items), auditory (hearing spoken items) and audiovisual (seeing written items while hearing congruent spoken items) modalities. Participants were young and older adults who performed a delayed lexical decision task (LDT) presented in blocks of visual, auditory, and audiovisual stimuli. Event-related potentials (ERPs) revealed differences between young and older adults despite older adults' ability to identify words and pseudowords as accurately as young adults. The observed differences included more focalized lexico-semantic access in the N400 time window in older relative to young adults, stronger re-instantiation and/or more widespread activity of the lexicality effect at the time of responding, and stronger multimodal integration for older relative to young adults. Our results offer new insights into how functional neural differences in older adults can result in efficient access to lexico-semantic representations across the lifespan.
Collapse
Affiliation(s)
| | - Martijn Baart
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Tilburg University, Dept. of Cognitive Neuropsychology, Tilburg, the Netherlands
| | - Arthur G Samuel
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Spain; Stony Brook University, Dept. of Psychology, Stony Brook, NY, United States
| | - Blair C Armstrong
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; University of Toronto, Dept. of Psychology and Department of Language Studies, Toronto, ON, Canada
| |
Collapse
|
4
|
Guediche S, de Bruin A, Caballero-Gaudes C, Baart M, Samuel AG. Second-language word recognition in noise: Interdependent neuromodulatory effects of semantic context and crosslinguistic interactions driven by word form similarity. Neuroimage 2021; 237:118168. [PMID: 34000398 DOI: 10.1016/j.neuroimage.2021.118168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/05/2021] [Accepted: 05/12/2021] [Indexed: 11/17/2022] Open
Abstract
Spoken language comprehension is a fundamental component of our cognitive skills. We are quite proficient at deciphering words from the auditory input despite the fact that the speech we hear is often masked by noise such as background babble originating from talkers other than the one we are attending to. To perceive spoken language as intended, we rely on prior linguistic knowledge and context. Prior knowledge includes all sounds and words that are familiar to a listener and depends on linguistic experience. For bilinguals, the phonetic and lexical repertoire encompasses two languages, and the degree of overlap between word forms across languages affects the degree to which they influence one another during auditory word recognition. To support spoken word recognition, listeners often rely on semantic information (i.e., the words we hear are usually related in a meaningful way). Although the number of multilinguals across the globe is increasing, little is known about how crosslinguistic effects (i.e., word overlap) interact with semantic context and affect the flexible neural systems that support accurate word recognition. The current multi-echo functional magnetic resonance imaging (fMRI) study addresses this question by examining how prime-target word pair semantic relationships interact with the target word's form similarity (cognate status) to the translation equivalent in the dominant language (L1) during accurate word recognition of a non-dominant (L2) language. We tested 26 early-proficient Spanish-Basque (L1-L2) bilinguals. When L2 targets matching L1 translation-equivalent phonological word forms were preceded by unrelated semantic contexts that drive lexical competition, a flexible language control (fronto-parietal-subcortical) network was upregulated, whereas when they were preceded by related semantic contexts that reduce lexical competition, it was downregulated. We conclude that an interplay between semantic and crosslinguistic effects regulates flexible control mechanisms of speech processing to facilitate L2 word recognition, in noise.
Collapse
Affiliation(s)
- Sara Guediche
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain.
| | | | | | - Martijn Baart
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain; Department of Cognitive Neuropsychology, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands
| | - Arthur G Samuel
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain; Stony Brook University, NY 11794-2500, United States; Ikerbasque Foundation, Spain
| |
Collapse
|
5
|
McLean MA, Van den Bergh BR, Baart M, Vroomen J, van den Heuvel MI. The late positive potential (LPP): A neural marker of internalizing problems in early childhood. Int J Psychophysiol 2020; 155:78-86. [DOI: 10.1016/j.ijpsycho.2020.06.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 05/16/2020] [Accepted: 06/11/2020] [Indexed: 10/24/2022]
|
6
|
López Zunini RA, Baart M, Samuel AG, Armstrong BC. Lexical access versus lexical decision processes for auditory, visual, and audiovisual items: Insights from behavioral and neural measures. Neuropsychologia 2020; 137:107305. [PMID: 31838100 DOI: 10.1016/j.neuropsychologia.2019.107305] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 09/17/2019] [Accepted: 12/11/2019] [Indexed: 11/30/2022]
Abstract
In two experiments, we investigated the relationship between lexical access processes, and processes that are specifically related to making lexical decisions. In Experiment 1, participants performed a standard lexical decision task in which they had to respond as quickly and as accurately as possible to visual (written), auditory (spoken) and audiovisual (written + spoken) items. In Experiment 2, a different group of participants performed the same task but were required to make responses after a delay. Linear mixed effect models on reaction times and single trial Event-Related Potentials (ERPs) revealed that ERP lexicality effects started earlier in the visual than auditory modality, and that effects were driven by the written input in the audiovisual modality. More negative ERP amplitudes predicted slower reaction times in all modalities in both experiments. However, these predictive amplitudes were mainly observed within the window of the lexicality effect in Experiment 1 (the speeded task), and shifted to post-response-probe time windows in Experiment 2 (the delayed task). The lexicality effects lasted longer in Experiment 1 than in Experiment 2, and in the delayed task, we additionally observed a "re-instantiation" of the lexicality effect related to the delayed response. Delaying the response in an otherwise identical lexical decision task thus allowed us to separate lexical access processes from processes specific to lexical decision.
Collapse
Affiliation(s)
| | - Martijn Baart
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Tilburg University, Dept. of Cognitive Neuropsychology, Tilburg, the Netherlands
| | - Arthur G Samuel
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Spain; Stony Brook University, Dept. of Psychology, Stony Brook, NY, United States
| | - Blair C Armstrong
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain; University of Toronto, Dept. of Psychology and Centre for French & Linguistics, Toronto, ON, Canada
| |
Collapse
|
7
|
Burgering MA, van Laarhoven T, Baart M, Vroomen J. Fluidity in the perception of auditory speech: Cross-modal recalibration of voice gender and vowel identity by a talking face. Q J Exp Psychol (Hove) 2020; 73:957-967. [PMID: 31931664 DOI: 10.1177/1747021819900884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Humans quickly adapt to variations in the speech signal. Adaptation may surface as recalibration, a learning effect driven by error-minimisation between a visual face and an ambiguous auditory speech signal, or as selective adaptation, a contrastive aftereffect driven by the acoustic clarity of the sound. Here, we examined whether these aftereffects occur for vowel identity and voice gender. Participants were exposed to male, female, or androgynous tokens of speakers pronouncing /e/, /ø/, (embedded in words with a consonant-vowel-consonant structure), or an ambiguous vowel halfway between /e/ and /ø/ dubbed onto the video of a male or female speaker pronouncing /e/ or /ø/. For both voice gender and vowel identity, we found assimilative aftereffects after exposure to auditory ambiguous adapter sounds, and contrastive aftereffects after exposure to auditory clear adapter sounds. This demonstrates that similar principles for adaptation in these dimensions are at play.
Collapse
Affiliation(s)
- Merel A Burgering
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Thijs van Laarhoven
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL-Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
8
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
9
|
Abstract
Although the default state of the world is that we see and hear other people talking, there is evidence that seeing and hearing ourselves rather than someone else may lead to visual (i.e., lip-read) or auditory "self" advantages. We assessed whether there is a "self" advantage for phonetic recalibration (a lip-read driven cross-modal learning effect) and selective adaptation (a contrastive effect in the opposite direction of recalibration). We observed both aftereffects as well as an on-line effect of lip-read information on auditory perception (i.e., immediate capture), but there was no evidence for a "self" advantage in any of the tasks (as additionally supported by Bayesian statistics). These findings strengthen the emerging notion that recalibration reflects a general learning mechanism, and bolster the argument that adaptation depends on rather low-level auditory/acoustic features of the speech signal.
Collapse
Affiliation(s)
- Maria Modelska
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Marie Pourquié
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- UPPA, IKER (UMR5478), Bayonne, France
| | - Martijn Baart
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands
| |
Collapse
|
10
|
Barraza P, Dumas G, Liu H, Blanco-Gomez G, van den Heuvel MI, Baart M, Pérez A. Implementing EEG hyperscanning setups. MethodsX 2019; 6:428-436. [PMID: 30906698 PMCID: PMC6411510 DOI: 10.1016/j.mex.2019.02.021] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 02/22/2019] [Indexed: 10/27/2022] Open
Abstract
Hyperscanning refers to obtaining simultaneous neural recordings from more than one person (Montage et al., 2002 [1]), that can be used to study interactive situations. In particular, hyperscanning with Electroencephalography (EEG) is becoming increasingly popular since it allows researchers to explore the interactive brain with a high temporal resolution. Notably, there is a 40-year gap between the first instance that simultaneous measurement of EEG activity was mentioned in the literature (Duane and Behrendt, 1965 [2]), and the first actual description of an EEG hyperscanning setup being implemented (Babiloni et al., 2006 [3]). To date, specific EEG hyperscanning devices have not yet been developed and EEG hyperscanning setups are not usually described with sufficient detail to be easily reproduced. Here, we offer a step-by-step description of solutions to many of these technological challenges. Specifically, we describe and provide customized implementations of EEG hyperscanning setups using hardware and software from different companies: Brain Products, ANT, EGI, and BioSemi. •Necessary details to set up a functioning EEG hyperscanning protocol are provided.•The setups allow independent measures and measures of synchronization between the signals of two different brains.•Individual electrical Ground and Reference is obtained in all discussed systems.
Collapse
Affiliation(s)
- Paulo Barraza
- Centro de Investigación Avanzada en Educación (CIAE), Universidad de Chile, Santiago de Chile, Chile
| | - Guillaume Dumas
- Human Genetics and Cognitive Functions Unit, Institut Pasteur, Paris, France.,CNRS UMR 3571 Genes, Synapses and Cognition, Institut Pasteur, Paris, France.,Human Genetics and Cognitive Functions, University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Huanhuan Liu
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China.,Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Gabriel Blanco-Gomez
- Centre for French & Linguistics, University of Toronto Scarborough, Toronto, Canada
| | | | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, the Netherlands.,BCBL, Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Alejandro Pérez
- Centre for French & Linguistics, University of Toronto Scarborough, Toronto, Canada.,Psychology Department, University of Toronto Scarborough, Toronto, Canada
| |
Collapse
|
11
|
Abstract
Perception of vocal affect is influenced by the concurrent sight of an emotional face. We demonstrate that the sight of an emotional face also can induce recalibration of vocal affect. Participants were exposed to videos of a ‘happy’ or ‘fearful’ face in combination with a slightly incongruous sentence with ambiguous prosody. After this exposure, ambiguous test sentences were rated as more ‘happy’ when the exposure phase contained ‘happy’ instead of ‘fearful’ faces. This auditory shift likely reflects recalibration that is induced by error minimization of the inter-sensory discrepancy. In line with this view, when the prosody of the exposure sentence was non-ambiguous and congruent with the face (without audiovisual discrepancy), aftereffects went in the opposite direction, likely reflecting adaptation. Our results demonstrate, for the first time, that perception of vocal affect is flexible and can be recalibrated by slightly discrepant visual information.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands. .,BCBL, Basque Center on Cognition, Brain and Language, Donostia, Spain.
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands.
| |
Collapse
|
12
|
Baart M, Lindborg A, Andersen TS. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception. Eur J Neurosci 2017; 46:2578-2583. [PMID: 28976045 PMCID: PMC5725699 DOI: 10.1111/ejn.13734] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 09/27/2017] [Accepted: 09/27/2017] [Indexed: 11/30/2022]
Abstract
Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech‐induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, Tilburg, 5000 LE, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
13
|
Baart M, Armstrong BC, Martin CD, Frost R, Carreiras M. Cross-modal noise compensation in audiovisual words. Sci Rep 2017; 7:42055. [PMID: 28169316 PMCID: PMC5294401 DOI: 10.1038/srep42055] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 01/06/2017] [Indexed: 11/09/2022] Open
Abstract
Perceiving linguistic input is vital for human functioning, but the process is complicated by the fact that the incoming signal is often degraded. However, humans can compensate for unimodal noise by relying on simultaneous sensory input from another modality. Here, we investigated noise-compensation for spoken and printed words in two experiments. In the first behavioral experiment, we observed that accuracy was modulated by reaction time, bias and sensitivity, but noise compensation could nevertheless be explained via accuracy differences when controlling for RT, bias and sensitivity. In the second experiment, we also measured Event Related Potentials (ERPs) and observed robust electrophysiological correlates of noise compensation starting at around 350 ms after stimulus onset, indicating that noise compensation is most prominent at lexical/semantic processing levels.
Collapse
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Blair C Armstrong
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology &Centre for French &Linguistics at Scarborough, University of Toronto, Toronto, Canada
| | - Clara D Martin
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain
| | - Ram Frost
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel.,Haskins Laboratories, New Haven, CT, USA
| | - Manuel Carreiras
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain.,University of the Basque Country. UPV/EHU, Bilbao, Spain
| |
Collapse
|
14
|
Baart M. Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology 2016; 53:1295-306. [DOI: 10.1111/psyp.12683] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 05/09/2016] [Indexed: 11/29/2022]
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language; Donostia-San Sebastián Spain
- Department of Cognitive Neuropsychology; Tilburg University; Tilburg The Netherlands
| |
Collapse
|
15
|
Shaw K, Baart M, Depowski N, Bortfeld H. Infants' preference for native audiovisual speech dissociated from congruency preference. PLoS One 2015; 10:e0126059. [PMID: 25927529 PMCID: PMC4415951 DOI: 10.1371/journal.pone.0126059] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 03/28/2015] [Indexed: 11/21/2022] Open
Abstract
Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.
Collapse
Affiliation(s)
- Kathleen Shaw
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
| | - Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain
| | - Nicole Depowski
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
| | - Heather Bortfeld
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
- Haskins Laboratories, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
16
|
Baart M, Samuel AG. Early processing of auditory lexical predictions revealed by ERPs. Neurosci Lett 2015; 585:98-102. [DOI: 10.1016/j.neulet.2014.11.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 11/06/2014] [Accepted: 11/26/2014] [Indexed: 10/24/2022]
|
17
|
Baart M, Bortfeld H, Vroomen J. Phonetic matching of auditory and visual speech develops during childhood: evidence from sine-wave speech. J Exp Child Psychol 2014; 129:157-64. [PMID: 25258018 DOI: 10.1016/j.jecp.2014.08.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 08/12/2014] [Accepted: 08/15/2014] [Indexed: 11/29/2022]
Abstract
The correspondence between auditory speech and lip-read information can be detected based on a combination of temporal and phonetic cross-modal cues. Here, we determined the point in developmental time at which children start to effectively use phonetic information to match a speech sound with one of two articulating faces. We presented 4- to 11-year-olds (N=77) with three-syllabic sine-wave speech replicas of two pseudo-words that were perceived as non-speech and asked them to match the sounds with the corresponding lip-read video. At first, children had no phonetic knowledge about the sounds, and matching was thus based on the temporal cues that are fully retained in sine-wave speech. Next, we trained all children to perceive the phonetic identity of the sine-wave speech and repeated the audiovisual (AV) matching task. Only at around 6.5 years of age did the benefit of having phonetic knowledge about the stimuli become apparent, thereby indicating that AV matching based on phonetic cues presumably develops more slowly than AV matching based on temporal cues.
Collapse
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain, and Language, 20009 Donostia (San Sebastián), Spain.
| | - Heather Bortfeld
- Department of Psychology, University of Connecticut, Storrs, CT 06269, USA; Haskins Laboratories, New Haven, CT 06511, USA
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, 5000 LE Tilburg, The Netherlands
| |
Collapse
|
18
|
Baart M, Stekelenburg JJ, Vroomen J. Electrophysiological evidence for speech-specific audiovisual integration. Neuropsychologia 2014; 53:115-21. [DOI: 10.1016/j.neuropsychologia.2013.11.011] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 11/07/2013] [Accepted: 11/19/2013] [Indexed: 11/26/2022]
|
19
|
Baart M, Vroomen J, Shaw K, Bortfeld H. Degrading phonetic information affects matching of audiovisual speech in adults, but not in infants. Cognition 2013; 130:31-43. [PMID: 24141035 DOI: 10.1016/j.cognition.2013.09.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Revised: 08/24/2013] [Accepted: 09/20/2013] [Indexed: 11/27/2022]
Abstract
Infants and adults are well able to match auditory and visual speech, but the cues on which they rely (viz. temporal, phonetic and energetic correspondence in the auditory and visual speech streams) may differ. Here we assessed the relative contribution of the different cues using sine-wave speech (SWS). Adults (N=52) and infants (N=34, age ranged in between 5 and 15months) matched 2 trisyllabic speech sounds ('kalisu' and 'mufapi'), either natural or SWS, with visual speech information. On each trial, adults saw two articulating faces and matched a sound to one of these, while infants were presented the same stimuli in a preferential looking paradigm. Adults' performance was almost flawless with natural speech, but was significantly less accurate with SWS. In contrast, infants matched the sound to the articulating face equally well for natural speech and SWS. These results suggest that infants rely to a lesser extent on phonetic cues than adults do to match audio to visual speech. This is in line with the notion that the ability to extract phonetic information from the visual signal increases during development, and suggests that phonetic knowledge might not be the basis for early audiovisual correspondence detection in speech.
Collapse
Affiliation(s)
- Martijn Baart
- Basque Center on Cognition, Brain and Language, Donostia, Spain; Dept. of Psychology, Tilburg University, Tilburg, The Netherlands
| | | | | | | |
Collapse
|
20
|
Baart M, de Boer-Schellekens L, Vroomen J. Lipread-induced phonetic recalibration in dyslexia. Acta Psychol (Amst) 2012; 140:91-5. [PMID: 22484551 DOI: 10.1016/j.actpsy.2012.03.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 02/14/2012] [Accepted: 03/11/2012] [Indexed: 10/28/2022] Open
Abstract
Auditory phoneme categories are less well-defined in developmental dyslexic readers than in fluent readers. Here, we examined whether poor recalibration of phonetic boundaries might be associated with this deficit. 22 adult dyslexic readers were compared with 22 fluent readers on a phoneme identification task and a task that measured phonetic recalibration by lipread speech (Bertelson, Vroomen, & De Gelder, 2003). In line with previous reports, we found that dyslexics were less categorical in the labeling of the speech sounds. The size of their phonetic recalibration effect, though, was comparable to that of normal readers. This result indicates that phonetic recalibration is unaffected in dyslexic readers, and that it is unlikely to lie at the foundation of their auditory phoneme categorization impairments. For normal readers however, it appeared that a well-calibrated system is related to auditory precision as the steepness of the auditory identification curve positively correlated with recalibration.
Collapse
|
21
|
|
22
|
|
23
|
Abstract
Listeners use lipread information to adjust the phonetic boundary between two speech categories (phonetic recalibration, Bertelson et al. 2003). Here, we examined phonetic recalibration while listeners were engaged in a visuospatial or verbal memory working memory task under different memory load conditions. Phonetic recalibration was--like selective speech adaptation--not affected by a concurrent verbal or visuospatial memory task. This result indicates that phonetic recalibration is a low-level process not critically depending on processes used in verbal- or visuospatial working memory.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Medical Psychology and Neuropsychology, Tilburg University, Warandelaan 2, P. O. Box 90153, 5000 LE Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Medical Psychology and Neuropsychology, Tilburg University, Warandelaan 2, P. O. Box 90153, 5000 LE Tilburg, The Netherlands
| |
Collapse
|
24
|
Baart M, Vroomen J. Do you see what you are hearing? Cross-modal effects of speech sounds on lipreading. Neurosci Lett 2010; 471:100-3. [PMID: 20080146 DOI: 10.1016/j.neulet.2010.01.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2009] [Revised: 12/14/2009] [Accepted: 01/09/2010] [Indexed: 11/28/2022]
Abstract
It is well known that visual information derived from mouth movements (i.e., lipreading) can have profound effects on auditory speech identification (e.g. the McGurk-effect). Here we examined the reverse phenomenon, namely whether auditory speech affects lipreading. We report that speech sounds dubbed onto lipread speech affect immediate identification of lipread tokens. This effect likely reflects genuine cross-modal integration of sensory signals and not just a simple response bias because we also observed adaptive shifts in visual identification of the ambiguous lipread tokens after exposure to incongruent audiovisual adapter stimuli. Presumably, listeners had learned to label the lipread stimulus in accordance with the sound, thus demonstrating that the interaction between hearing and lipreading is genuinely bi-directional.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Medical Psychology and Neuropsychology, Tilburg University, P.O. Box 90153, Warandelaan 2, 5000 LE Tilburg, The Netherlands
| | | |
Collapse
|
25
|
Vroomen J, Baart M. Phonetic recalibration only occurs in speech mode. Cognition 2009; 110:254-9. [DOI: 10.1016/j.cognition.2008.10.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Revised: 09/12/2008] [Accepted: 10/24/2008] [Indexed: 11/25/2022]
|
26
|
Abstract
Listeners hearing an ambiguous speech sound flexibly adjust their phonetic categories in accordance with lipread information telling what the phoneme should be (recalibration). Here, we tested the stability of lipread-induced recalibration over time. Listeners were exposed to an ambiguous sound halfway between /t/ and /p/ that was dubbed onto a face articulating either /t/ or /p/. When tested immediately, listeners exposed to lipread /t/ were more likely to categorize the ambiguous sound as /t/ than listeners exposed to /p/. This aftereffect dissipated quickly with prolonged testing and did not reappear after a 24-hour delay. Recalibration of phonetic categories is thus a fragile phenomenon.
Collapse
Affiliation(s)
- Jean Vroomen
- Tilburg University, Dept. of Psychology, Tilburg, The Netherlands.
| | | |
Collapse
|