1
|
Sandlund J, Duriseti R, Ladhani SN, Stuart K, Noble J, Høeg TB. Child mask mandates for COVID-19: a systematic review. Arch Dis Child 2024; 109:e2. [PMID: 38050026 PMCID: PMC10894839 DOI: 10.1136/archdischild-2023-326215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/03/2023] [Indexed: 12/06/2023]
Abstract
BACKGROUND Mask mandates for children during the COVID-19 pandemic varied in different locations. A risk-benefit analysis of this intervention has not yet been performed. In this study, we performed a systematic review to assess research on the effectiveness of mask wearing in children. METHODS We performed database searches up to February 2023. The studies were screened by title and abstract, and included studies were further screened as full-text references. A risk-of-bias analysis was performed by two independent reviewers and adjudicated by a third reviewer. RESULTS We screened 597 studies and included 22 in the final analysis. There were no randomised controlled trials in children assessing the benefits of mask wearing to reduce SARS-CoV-2 infection or transmission. The six observational studies reporting an association between child masking and lower infection rate or antibody seropositivity had critical (n=5) or serious (n=1) risk of bias; all six were potentially confounded by important differences between masked and unmasked groups and two were shown to have non-significant results when reanalysed. Sixteen other observational studies found no association between mask wearing and infection or transmission. CONCLUSIONS Real-world effectiveness of child mask mandates against SARS-CoV-2 transmission or infection has not been demonstrated with high-quality evidence. The current body of scientific data does not support masking children for protection against COVID-19.
Collapse
Affiliation(s)
- Johanna Sandlund
- Board-Certified Clinical Microbiologist and Independent Scholar, Alameda, California, USA
| | - Ram Duriseti
- Stanford University School of Medicine, Stanford, California, USA
| | - Shamez N Ladhani
- Immunisation Department, UK Health Security Agency, London, UK
- Centre for Neonatal and Paediatric Infection, St. George's University of London, London, UK
| | - Kelly Stuart
- SmallTalk Pediatric Therapy, San Diego, California, USA
| | - Jeanne Noble
- Emergency Medicine, University of California San Francisco, San Francisco, California, USA
| | - Tracy Beth Høeg
- Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
- Clinical Research, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
2
|
Ross LA, Molholm S, Butler JS, Del Bene VA, Brima T, Foxe JJ. Neural correlates of audiovisual narrative speech perception in children and adults on the autism spectrum: A functional magnetic resonance imaging study. Autism Res 2024; 17:280-310. [PMID: 38334251 DOI: 10.1002/aur.3104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/19/2024] [Indexed: 02/10/2024]
Abstract
Autistic individuals show substantially reduced benefit from observing visual articulations during audiovisual speech perception, a multisensory integration deficit that is particularly relevant to social communication. This has mostly been studied using simple syllabic or word-level stimuli and it remains unclear how altered lower-level multisensory integration translates to the processing of more complex natural multisensory stimulus environments in autism. Here, functional neuroimaging was used to examine neural correlates of audiovisual gain (AV-gain) in 41 autistic individuals to those of 41 age-matched non-autistic controls when presented with a complex audiovisual narrative. Participants were presented with continuous narration of a story in auditory-alone, visual-alone, and both synchronous and asynchronous audiovisual speech conditions. We hypothesized that previously identified differences in audiovisual speech processing in autism would be characterized by activation differences in brain regions well known to be associated with audiovisual enhancement in neurotypicals. However, our results did not provide evidence for altered processing of auditory alone, visual alone, audiovisual conditions or AV- gain in regions associated with the respective task when comparing activation patterns between groups. Instead, we found that autistic individuals responded with higher activations in mostly frontal regions where the activation to the experimental conditions was below baseline (de-activations) in the control group. These frontal effects were observed in both unisensory and audiovisual conditions, suggesting that these altered activations were not specific to multisensory processing but reflective of more general mechanisms such as an altered disengagement of Default Mode Network processes during the observation of the language stimulus across conditions.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- School of Mathematics and Statistics, Technological University Dublin, City Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- Heersink School of Medicine, Department of Neurology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Tufikameni Brima
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| |
Collapse
|
3
|
Birulés J, Goupil L, Josse J, Fort M. The Role of Talking Faces in Infant Language Learning: Mind the Gap between Screen-Based Settings and Real-Life Communicative Interactions. Brain Sci 2023; 13:1167. [PMID: 37626523 PMCID: PMC10452843 DOI: 10.3390/brainsci13081167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 08/27/2023] Open
Abstract
Over the last few decades, developmental (psycho) linguists have demonstrated that perceiving talking faces audio-visually is important for early language acquisition. Using mostly well-controlled and screen-based laboratory approaches, this line of research has shown that paying attention to talking faces is likely to be one of the powerful strategies infants use to learn their native(s) language(s). In this review, we combine evidence from these screen-based studies with another line of research that has studied how infants learn novel words and deploy their visual attention during naturalistic play. In our view, this is an important step toward developing an integrated account of how infants effectively extract audiovisual information from talkers' faces during early language learning. We identify three factors that have been understudied so far, despite the fact that they are likely to have an important impact on how infants deploy their attention (or not) toward talking faces during social interactions: social contingency, speaker characteristics, and task- dependencies. Last, we propose ideas to address these issues in future research, with the aim of reducing the existing knowledge gap between current experimental studies and the many ways infants can and do effectively rely upon the audiovisual information extracted from talking faces in their real-life language environment.
Collapse
Affiliation(s)
- Joan Birulés
- Laboratoire de Psychologie et NeuroCognition, CNRS UMR 5105, Université Grenoble Alpes, 38058 Grenoble, France; (L.G.); (J.J.); (M.F.)
| | - Louise Goupil
- Laboratoire de Psychologie et NeuroCognition, CNRS UMR 5105, Université Grenoble Alpes, 38058 Grenoble, France; (L.G.); (J.J.); (M.F.)
| | - Jérémie Josse
- Laboratoire de Psychologie et NeuroCognition, CNRS UMR 5105, Université Grenoble Alpes, 38058 Grenoble, France; (L.G.); (J.J.); (M.F.)
| | - Mathilde Fort
- Laboratoire de Psychologie et NeuroCognition, CNRS UMR 5105, Université Grenoble Alpes, 38058 Grenoble, France; (L.G.); (J.J.); (M.F.)
- Centre de Recherche en Neurosciences de Lyon, INSERM U1028-CNRS UMR 5292, Université Lyon 1, 69500 Bron, France
| |
Collapse
|
4
|
Sato M. The timing of visual speech modulates auditory neural processing. BRAIN AND LANGUAGE 2022; 235:105196. [PMID: 36343508 DOI: 10.1016/j.bandl.2022.105196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 10/15/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In face-to-face communication, visual information from a speaker's face and time-varying kinematics of articulatory movements have been shown to fine-tune auditory neural processing and improve speech recognition. To further determine whether the timing of visual gestures modulates auditory cortical processing, three sets of syllables only differing in the onset and duration of silent prephonatory movements, before the acoustic speech signal, were contrasted using EEG. Despite similar visual recognition rates, an increase in the amplitude of P2 auditory evoked responses was observed from the longest to the shortest movements. Taken together, these results clarify how audiovisual speech perception partly operates through visually-based predictions and related processing time, with acoustic-phonetic neural processing paralleling the timing of visual prephonatory gestures.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
5
|
Ashokumar M, Guichet C, Schwartz JL, Ito T. Correlation between the effect of orofacial somatosensory inputs in speech perception and speech production performance. AUDITORY PERCEPTION & COGNITION 2022; 6:97-107. [PMID: 37260602 PMCID: PMC10229140 DOI: 10.1080/25742442.2022.2134674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 09/20/2022] [Indexed: 06/02/2023]
Abstract
Introduction Orofacial somatosensory inputs modify the perception of speech sounds. Such auditory-somatosensory integration likely develops alongside speech production acquisition. We examined whether the somatosensory effect in speech perception varies depending on individual characteristics of speech production. Methods The somatosensory effect in speech perception was assessed by changes in category boundary between /e/ and /ø/ in a vowel identification test resulting from somatosensory stimulation providing facial skin deformation in the rearward direction corresponding to articulatory movement for /e/ applied together with the auditory input. Speech production performance was quantified by the acoustic distances between the average first, second and third formants of /e/ and /ø/ utterances recorded in a separate test. Results The category boundary between /e/ and /ø/ was significantly shifted towards /ø/ due to the somatosensory stimulation which is consistent with previous research. The amplitude of the category boundary shift was significantly correlated with the acoustic distance between the mean second - and marginally third - formants of /e/ and /ø/ productions, with no correlation with the first formant distance. Discussion Greater acoustic distances can be related to larger contrasts between the articulatory targets of vowels in speech production. These results suggest that the somatosensory effect in speech perception can be linked to speech production performance.
Collapse
Affiliation(s)
- Monica Ashokumar
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
| | - Clément Guichet
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
| | - Jean-Luc Schwartz
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
| | - Takayuki Ito
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
- Haskins Laboratories, New Haven, USA
| |
Collapse
|
6
|
Intelligibility of speech produced by sighted and blind adults. PLoS One 2022; 17:e0272127. [PMID: 36107945 PMCID: PMC9477328 DOI: 10.1371/journal.pone.0272127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 07/13/2022] [Indexed: 11/25/2022] Open
Abstract
Purpose It is well known that speech uses both the auditory and visual modalities to convey information. In cases of congenital sensory deprivation, the feedback language learners have access to for mapping visible and invisible orofacial articulation is impoverished. Although the effects of blindness on the movements of the lips, jaw, and tongue have been documented in francophone adults, not much is known about their consequences for speech intelligibility. The objective of this study is to investigate the effects of congenital visual deprivation on vowel intelligibility in adult speakers of Canadian French. Method Twenty adult listeners performed two perceptual identification tasks in which vowels produced by congenitally blind adults and sighted adults were used as stimuli. The vowels were presented in the auditory, visual, and audiovisual modalities (experiment 1) and at different signal-to-noise ratios in the audiovisual modality (experiment 2). Correct identification scores were calculated. Sequential information analyses were also conducted to assess the amount of information transmitted to the listeners along the three vowel features of height, place of articulation, and rounding. Results The results showed that, although blind speakers did not differ from their sighted peers in the auditory modality, they had lower scores in the audiovisual and visual modalities. Some vowels produced by blind speakers are also less robust in noise than those produced by sighted speakers. Conclusion Together, the results suggest that adult blind speakers have learned to adapt to their sensory loss so that they can successfully achieve intelligible vowel targets in non-noisy conditions but that they produce less intelligible speech in noisy conditions. Thus, the trade-off between visible (lips) and invisible (tongue) articulatory cues observed between vowels produced by blind and sighted speakers is not equivalent in terms of perceptual efficiency.
Collapse
|
7
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
8
|
Kulkarni A, Kegler M, Reichenbach T. Effect of visual input on syllable parsing in a computational model of a neural microcircuit for speech processing. J Neural Eng 2021; 18. [PMID: 34547737 DOI: 10.1088/1741-2552/ac28d3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/21/2021] [Indexed: 11/12/2022]
Abstract
Objective.Seeing a person talking can help us understand them, particularly in a noisy environment. However, how the brain integrates the visual information with the auditory signal to enhance speech comprehension remains poorly understood.Approach.Here we address this question in a computational model of a cortical microcircuit for speech processing. The model consists of an excitatory and an inhibitory neural population that together create oscillations in the theta frequency range. When stimulated with speech, the theta rhythm becomes entrained to the onsets of syllables, such that the onsets can be inferred from the network activity. We investigate how well the obtained syllable parsing performs when different types of visual stimuli are added. In particular, we consider currents related to the rate of syllables as well as currents related to the mouth-opening area of the talking faces.Main results.We find that currents that target the excitatory neuronal population can influence speech comprehension, both boosting it or impeding it, depending on the temporal delay and on whether the currents are excitatory or inhibitory. In contrast, currents that act on the inhibitory neurons do not impact speech comprehension significantly.Significance.Our results suggest neural mechanisms for the integration of visual information with the acoustic information in speech and make experimentally-testable predictions.
Collapse
Affiliation(s)
- Anirudh Kulkarni
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ London, United Kingdom
| | - Mikolaj Kegler
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ London, United Kingdom
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ London, United Kingdom.,Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Konrad-Zuse-Strasse 3/5, Erlangen, 91056, Germany
| |
Collapse
|
9
|
Sekiyama K, Hisanaga S, Mugitani R. Selective attention to the mouth of a talker in Japanese-learning infants and toddlers: Its relationship with vocabulary and compensation for noise. Cortex 2021; 140:145-156. [PMID: 33989900 DOI: 10.1016/j.cortex.2021.03.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/16/2020] [Accepted: 03/23/2021] [Indexed: 11/25/2022]
Abstract
Infants increasingly gaze at the mouth of talking faces during the latter half of the first postnatal year. This study investigated mouth-looking behavior of 120 full-term infants and toddlers (6 months-3 years) and 12 young adults (21-24 years) from Japanese monolingual families. The purpose of the study included: (1) Is such an attentional shift to the mouth in infancy similarly observed in Japanese environment where contribution of visual speech is known to be relatively weak? (2) Can noisy conditions increase mouth-looking behavior of Japanese young children? (3) Is the mouth-looking behavior related to language acquisition? To this end, movies of a talker speaking short phrases were presented while manipulating signal-to-noise ratio (SNR: Clear, SN+4, and SN-4). Expressive vocabulary of toddlers was obtained through their parents. The results indicated that Japanese infants initially have a strong preference for the eyes to mouth which is weakened toward 10 months, but the shift was later and in a milder fashion compared to known results for English-learning infants. Even after 10 months, no clear-cut preference for the mouth was observed even in linguistically challenging situations with strong noise until 3 years of age. In the Clear condition, there was a return of the gaze to the eyes as early as 3 years of age, where they showed increasing attention to the mouth with increasing noise level. In addition, multiple regression analyses revealed a tendency that 2- and 3-year-olds with larger vocabulary increasingly look at the eyes. Overall, the gaze of Japanese-learning infants and toddlers was more biased to the eyes in various aspects compared to known results of English-learning infants. The present findings shed new light on our understanding of the development of selective attention to the mouth in non-western populations.
Collapse
Affiliation(s)
- Kaoru Sekiyama
- Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan; Cognitive Psychology Laboratory, Faculty of Letters, Kumamoto University, Kumamoto, Japan.
| | - Satoko Hisanaga
- Cognitive Psychology Laboratory, Faculty of Letters, Kumamoto University, Kumamoto, Japan
| | - Ryoko Mugitani
- Department of Psychology, Faculty of Integrated Arts and Social Sciences, Japan Women's University, Kanagawa, Japan
| |
Collapse
|
10
|
Basirat A, Allart É, Brunellière A, Martin Y. Audiovisual speech segmentation in post-stroke aphasia: a pilot study. Top Stroke Rehabil 2019; 26:588-594. [PMID: 31369358 DOI: 10.1080/10749357.2019.1643566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Background: Stroke may cause sentence comprehension disorders. Speech segmentation, i.e. the ability to detect word boundaries while listening to continuous speech, is an initial step allowing the successful identification of words and the accurate understanding of meaning within sentences. It has received little attention in people with post-stroke aphasia (PWA).Objectives: Our goal was to study speech segmentation in PWA and examine the potential benefit of seeing the speakers' articulatory gestures while segmenting sentences.Methods: Fourteen PWA and twelve healthy controls participated in this pilot study. Performance was measured with a word-monitoring task. In the auditory-only modality, participants were presented with auditory-only stimuli while in the audiovisual modality, visual speech cues (i.e. speaker's articulatory gestures) accompanied the auditory input. The proportion of correct responses was calculated for each participant and each modality. Visual enhancement was then calculated in order to estimate the potential benefit of seeing the speaker's articulatory gestures.Results: Both in auditory-only and audiovisual modalities, PWA performed significantly less well than controls, who had 100% correct performance in both modalities. The performance of PWA was correlated with their phonological ability. Six PWA used the visual cues. Group level analysis performed on PWA did not show any reliable difference between the auditory-only and audiovisual modalities (median of visual enhancement = 7% [Q1 - Q3: -5 - 39]).Conclusion: Our findings show that speech segmentation disorder may exist in PWA. This points to the importance of assessing and training speech segmentation after stroke. Further studies should investigate the characteristics of PWA who use visual speech cues during sentence processing.
Collapse
Affiliation(s)
- Anahita Basirat
- UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Univ. Lille, CNRS, CHU Lille, Lille, France
| | - Étienne Allart
- Neurorehabilitation Unit, Lille University Medical Center, Lille, France.,Inserm U1171, University Lille, Degenerative and Vascular Cognitive Disorders, Lille, France
| | - Angèle Brunellière
- UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Univ. Lille, CNRS, CHU Lille, Lille, France
| | | |
Collapse
|
11
|
Vilain A, Dole M, Lœvenbruck H, Pascalis O, Schwartz JL. The role of production abilities in the perception of consonant category in infants. Dev Sci 2019; 22:e12830. [PMID: 30908771 DOI: 10.1111/desc.12830] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 02/25/2019] [Accepted: 03/06/2019] [Indexed: 12/01/2022]
Abstract
The influence of motor knowledge on speech perception is well established, but the functional role of the motor system is still poorly understood. The present study explores the hypothesis that speech production abilities may help infants discover phonetic categories in the speech stream, in spite of coarticulation effects. To this aim, we examined the influence of babbling abilities on consonant categorization in 6- and 9-month-old infants. Using an intersensory matching procedure, we investigated the infants' capacity to associate auditory information about a consonant in various vowel contexts with visual information about the same consonant, and to map auditory and visual information onto a common phoneme representation. Moreover, a parental questionnaire evaluated the infants' consonantal repertoire. In a first experiment using /b/-/d/ consonants, we found that infants who displayed babbling abilities and produced the /b/ and/or the /d/ consonants in repetitive sequences were able to correctly perform intersensory matching, while non-babblers were not. In a second experiment using the /v/-/z/ pair, which is as visually contrasted as the /b/-/d/ pair but which is usually not produced at the tested ages, no significant matching was observed, for any group of infants, babbling or not. These results demonstrate, for the first time, that the emergence of babbling could play a role in the extraction of vowel-independent representations for consonant place of articulation. They have important implications for speech perception theories, as they highlight the role of sensorimotor interactions in the development of phoneme representations during the first year of life.
Collapse
Affiliation(s)
- Anne Vilain
- GIPSA-Lab, Speech & Cognition Department, CNRS, Université Grenoble Alpes, Grenoble INP, Grenoble, France
| | - Marjorie Dole
- GIPSA-Lab, Speech & Cognition Department, CNRS, Université Grenoble Alpes, Grenoble INP, Grenoble, France
| | - Hélène Lœvenbruck
- LPNC, CNRS, Université Grenoble Alpes, Université Savoie Mont Blanc, Grenoble, France
| | - Olivier Pascalis
- LPNC, CNRS, Université Grenoble Alpes, Université Savoie Mont Blanc, Grenoble, France
| | - Jean-Luc Schwartz
- GIPSA-Lab, Speech & Cognition Department, CNRS, Université Grenoble Alpes, Grenoble INP, Grenoble, France
| |
Collapse
|
12
|
Treille A, Vilain C, Schwartz JL, Hueber T, Sato M. Electrophysiological evidence for Audio-visuo-lingual speech integration. Neuropsychologia 2018; 109:126-133. [DOI: 10.1016/j.neuropsychologia.2017.12.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 11/21/2017] [Accepted: 12/13/2017] [Indexed: 01/25/2023]
|
13
|
Sánchez-García C, Kandel S, Savariaux C, Soto-Faraco S. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study. Multisens Res 2018; 31:57-78. [DOI: 10.1163/22134808-00002560] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 02/20/2017] [Indexed: 11/19/2022]
Abstract
Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.
Collapse
Affiliation(s)
- Carolina Sánchez-García
- Departament de Tecnologies de la Informació i les Comunicacions, Universitat Pompeu Fabra, Barcelona, Spain
| | - Sonia Kandel
- Université Grenoble Alpes, GIPSA-lab (CNRS UMR 5216), Grenoble, France
| | | | - Salvador Soto-Faraco
- Departament de Tecnologies de la Informació i les Comunicacions, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
14
|
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration. Exp Brain Res 2017; 235:2867-2876. [PMID: 28676921 DOI: 10.1007/s00221-017-5018-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 06/23/2017] [Indexed: 10/19/2022]
Abstract
Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Collapse
|
15
|
Kandel S, Burfin S, Méary D, Ruiz-Tada E, Costa A, Pascalis O. The Impact of Early Bilingualism on Face Recognition Processes. Front Psychol 2016; 7:1080. [PMID: 27486422 PMCID: PMC4949974 DOI: 10.3389/fpsyg.2016.01080] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 07/01/2016] [Indexed: 11/30/2022] Open
Abstract
Early linguistic experience has an impact on the way we decode audiovisual speech in face-to-face communication. The present study examined whether differences in visual speech decoding could be linked to a broader difference in face processing. To identify a phoneme we have to do an analysis of the speaker's face to focus on the relevant cues for speech decoding (e.g., locating the mouth with respect to the eyes). Face recognition processes were investigated through two classic effects in face recognition studies: the Other-Race Effect (ORE) and the Inversion Effect. Bilingual and monolingual participants did a face recognition task with Caucasian faces (own race), Chinese faces (other race), and cars that were presented in an Upright or Inverted position. The results revealed that monolinguals exhibited the classic ORE. Bilinguals did not. Overall, bilinguals were slower than monolinguals. These results suggest that bilinguals' face processing abilities differ from monolinguals'. Early exposure to more than one language may lead to a perceptual organization that goes beyond language processing and could extend to face analysis. We hypothesize that these differences could be due to the fact that bilinguals focus on different parts of the face than monolinguals, making them more efficient in other race face processing but slower. However, more studies using eye-tracking techniques are necessary to confirm this explanation.
Collapse
Affiliation(s)
- Sonia Kandel
- GIPSA-Lab (CNRS UMR 5216), Université Grenoble AlpesGrenoble, France
- Institut Universitaire de FranceParis, France
| | - Sabine Burfin
- Laboratoire de Psychologie et NeuroCognition (CNRS UMR 5105) – Université Grenoble AlpesGrenoble, France
| | - David Méary
- Laboratoire de Psychologie et NeuroCognition (CNRS UMR 5105) – Université Grenoble AlpesGrenoble, France
| | | | - Albert Costa
- Universitat Pompeu FabraBarcelona, Spain
- Institució Catalana de Recerca i Estudis AvançatsBarcelona, Spain
| | - Olivier Pascalis
- Laboratoire de Psychologie et NeuroCognition (CNRS UMR 5105) – Université Grenoble AlpesGrenoble, France
| |
Collapse
|
16
|
Nahorna O, Berthommier F, Schwartz JL. Audio-visual speech scene analysis: characterization of the dynamics of unbinding and rebinding the McGurk effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:362-377. [PMID: 25618066 DOI: 10.1121/1.4904536] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage.
Collapse
Affiliation(s)
- Olha Nahorna
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| | - Frédéric Berthommier
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| | - Jean-Luc Schwartz
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| |
Collapse
|
17
|
Ganesh AC, Berthommier F, Vilain C, Sato M, Schwartz JL. A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception. Front Psychol 2014; 5:1340. [PMID: 25505438 PMCID: PMC4244540 DOI: 10.3389/fpsyg.2014.01340] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Accepted: 11/03/2014] [Indexed: 11/13/2022] Open
Abstract
Audiovisual (AV) speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent AV context (composed of incongruent auditory and visual speech materials) the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage "binding and fusion" model of AV speech perception, with an early AV binding stage controlling the fusion/decision process and likely to produce "unbinding" with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent AV speech perception, according to either prior coherent or incoherent AV contexts. Following the coherent context, in line with previous electroencephalographic/magnetoencephalographic studies, visual information in the congruent AV condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent AV condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on AV interactions.
Collapse
Affiliation(s)
- Attigodu C Ganesh
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Frédéric Berthommier
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Coriandre Vilain
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Marc Sato
- CNRS, Laboratoire Parole et Langage, Brain and Language Research Institute, UMR 7309, Aix-Marseille University, Aix-en-Provence France
| | - Jean-Luc Schwartz
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| |
Collapse
|
18
|
Strelnikov K, Marx M, Lagleyre S, Fraysse B, Deguine O, Barone P. PET-imaging of brain plasticity after cochlear implantation. Hear Res 2014; 322:180-7. [PMID: 25448166 DOI: 10.1016/j.heares.2014.10.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 09/05/2014] [Accepted: 10/01/2014] [Indexed: 10/24/2022]
Abstract
In this article, we review the PET neuroimaging literature, which indicates peculiarities of brain networks involved in speech restoration after cochlear implantation. We consider data on implanted patients during stimulation as well as during resting state, which indicates basic long-term reorganisation of brain functional architecture. On the basis of our analysis of neuroimaging literature and considering our own studies, we indicate that auditory recovery in deaf patients after cochlear implantation partly relies on visual cues. The brain develops mechanisms of audio-visual integration as a strategy to achieve high levels of speech recognition. It turns out that this neuroimaging evidence is in line with behavioural findings of better audiovisual integration in these patients. Thus, strong visually and audio-visually based rehabilitation during the first months after cochlear implantation would significantly improve and fasten the functional recovery of speech intelligibility and other auditory functions in these patients. We provide perspectives for further neuroimaging studies in cochlear implanted patients, which would help understand brain organisation to restore auditory cognitive processing in the implanted patients and would potentially suggest novel approaches for their rehabilitation. This article is part of a Special Issue entitled <Lasker Award>.
Collapse
Affiliation(s)
- K Strelnikov
- Université de Toulouse, Cerveau & Cognition, Université Paul Sabatier, Toulouse France; CerCo, CNRS UMR 5549, Toulouse France
| | - M Marx
- Service d'Oto-Rhino-Laryngologie, Hopital Purpan, Toulouse, France
| | - S Lagleyre
- Service d'Oto-Rhino-Laryngologie, Hopital Purpan, Toulouse, France
| | - B Fraysse
- Service d'Oto-Rhino-Laryngologie, Hopital Purpan, Toulouse, France
| | - O Deguine
- Université de Toulouse, Cerveau & Cognition, Université Paul Sabatier, Toulouse France; CerCo, CNRS UMR 5549, Toulouse France; Service d'Oto-Rhino-Laryngologie, Hopital Purpan, Toulouse, France
| | - P Barone
- Université de Toulouse, Cerveau & Cognition, Université Paul Sabatier, Toulouse France; CerCo, CNRS UMR 5549, Toulouse France.
| |
Collapse
|
19
|
So CK, Attina V. Cross-language perception of Cantonese vowels spoken by native and non-native speakers. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2014; 43:611-630. [PMID: 24057944 DOI: 10.1007/s10936-013-9270-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This study examined the effect of native language background on listeners' perception of native and non-native vowels spoken by native (Hong Kong Cantonese) and non-native (Mandarin and Australian English) speakers. They completed discrimination and an identification task with and without visual cues in clear and noisy conditions. Results indicated that visual cues did not facilitate perception, and performance was better in clear than in noisy conditions. More importantly, the Cantonese talker's vowels were the easiest to discriminate, and the Mandarin talker's vowels were as intelligible as the native talkers' speech. These results supported the interlanguage speech native intelligibility benefit patterns proposed by Hayes-Harb et al. (J Phonetics 36:664-679, 2008). The Mandarin and English listeners' identification patterns were similar to those of the Cantonese listeners, suggesting that they might have assimilated Cantonese vowels to their closest native vowels. In addition, listeners' perceptual patterns were consistent with the principles of Best's Perceptual Assimilation Model (Best in Speech perception and linguistic experience: issues in cross-language research. York Press, Timonium, 1995).
Collapse
Affiliation(s)
- Connie K So
- MARCS Institute, University of Western Sydney, Sydney, Australia,
| | | |
Collapse
|
20
|
Reinisch E, Wozny DR, Mitterer H, Holt LL. Phonetic category recalibration: What are the categories? JOURNAL OF PHONETICS 2014; 45:91-105. [PMID: 24932053 PMCID: PMC4052890 DOI: 10.1016/j.wocn.2014.04.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Listeners use lexical or visual context information to recalibrate auditory speech perception. After hearing an ambiguous auditory stimulus between /aba/ and /ada/ coupled with a clear visual stimulus (e.g., lip closure in /aba/), an ambiguous auditory-only stimulus is perceived in line with the previously seen visual stimulus. What remains unclear, however, is what exactly listeners are recalibrating: phonemes, phone sequences, or acoustic cues. To address this question we tested generalization of visually-guided auditory recalibration to 1) the same phoneme contrast cued differently (i.e., /aba/-/ada/ vs. /ibi/-/idi/ where the main cues are formant transitions in the vowels vs. burst and frication of the obstruent), 2) a different phoneme contrast cued identically (/aba/-/ada/ vs. /ama/-/ana/ both cued by formant transitions in the vowels), and 3) the same phoneme contrast with the same cues in a different acoustic context (/aba/-/ada/ vs. (/ubu/-/udu/). Whereas recalibration was robust for all recalibration control trials, no generalization was found in any of the experiments. This suggests that perceptual recalibration may be more specific than previously thought as it appears to be restricted to the phoneme category experienced during exposure as well as to the specific manipulated acoustic cues. We suggest that recalibration affects context-dependent sub-lexical units.
Collapse
Affiliation(s)
- Eva Reinisch
- Department of Phonetics and Speech Processing, Ludwig Maximilian University Munich
- Department of Psychology, and Center for the Neural Basis of Cognition, Carnegie Mellon University
- Corresponding author at: Ludwig Maximilian University Munich, Dept. of Phonetics and Speech Processing Schellingstr. 3 80799 Munich, Germany. Tel.: +49 (0) 89/2180 5752.
| | - David R. Wozny
- Department of Psychology, and Center for the Neural Basis of Cognition, Carnegie Mellon University
| | | | - Lori L. Holt
- Department of Psychology, and Center for the Neural Basis of Cognition, Carnegie Mellon University
| |
Collapse
|
21
|
Pascalis O, Loevenbruck H, Quinn PC, Kandel S, Tanaka JW, Lee K. On the Links Among Face Processing, Language Processing, and Narrowing During Development. CHILD DEVELOPMENT PERSPECTIVES 2014; 8:65-70. [PMID: 25254069 PMCID: PMC4164271 DOI: 10.1111/cdep.12064] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
From the beginning of life, face and language processing are crucial for establishing social communication. Studies on the development of systems for processing faces and language have yielded such similarities as perceptual narrowing across both domains. In this article, we review several functions of human communication, and then describe how the tools used to accomplish those functions are modified by perceptual narrowing. We conclude that narrowing is common to all forms of social communication. We argue that during evolution, social communication engaged different perceptual and cognitive systems-face, facial expression, gesture, vocalization, sound, and oral language-that emerged at different times. These systems are interactive and linked to some extent. In this framework, narrowing can be viewed as a way infants adapt to their native social group.
Collapse
Affiliation(s)
| | - Hélène Loevenbruck
- Université Grenoble Alpes, CNRS
- Grenoble Images Parole Signal Automatique, CNRS
| | | | | | | | | |
Collapse
|
22
|
Treille A, Vilain C, Sato M. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception. Front Psychol 2014; 5:420. [PMID: 24860533 PMCID: PMC4026678 DOI: 10.3389/fpsyg.2014.00420] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 04/21/2014] [Indexed: 12/03/2022] Open
Abstract
Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.
Collapse
Affiliation(s)
- Avril Treille
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| | - Coriandre Vilain
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| | - Marc Sato
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| |
Collapse
|
23
|
Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions. Neuropsychologia 2014; 57:71-7. [DOI: 10.1016/j.neuropsychologia.2014.02.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Revised: 02/02/2014] [Accepted: 02/04/2014] [Indexed: 11/21/2022]
|
24
|
Fort M, Kandel S, Chipot J, Savariaux C, Granjon L, Spinelli E. Seeing the initial articulatory gestures of a word triggers lexical access. ACTA ACUST UNITED AC 2013. [DOI: 10.1080/01690965.2012.701758] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
25
|
Silent articulation modulates auditory and audiovisual speech perception. Exp Brain Res 2013; 227:275-88. [DOI: 10.1007/s00221-013-3510-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Accepted: 04/03/2013] [Indexed: 10/26/2022]
|
26
|
Esposito A, Esposito AM. On the recognition of emotional vocal expressions: motivations for a holistic approach. Cogn Process 2012; 13 Suppl 2:541-50. [PMID: 22872508 DOI: 10.1007/s10339-012-0516-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Accepted: 07/11/2012] [Indexed: 10/28/2022]
Abstract
Human beings seem to be able to recognize emotions from speech very well and information communication technology aims to implement machines and agents that can do the same. However, to be able to automatically recognize affective states from speech signals, it is necessary to solve two main technological problems. The former concerns the identification of effective and efficient processing algorithms capable of capturing emotional acoustic features from speech sentences. The latter focuses on finding computational models able to classify, with an approximation as good as human listeners, a given set of emotional states. This paper will survey these topics and provide some insights for a holistic approach to the automatic analysis, recognition and synthesis of affective states.
Collapse
Affiliation(s)
- Anna Esposito
- Department of Psychology, Second University of Naples, Caserta, Italy.
| | | |
Collapse
|
27
|
Fort M, Spinelli E, Savariaux C, Kandel S. Audiovisual vowel monitoring and the word superiority effect in children. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2012. [DOI: 10.1177/0165025412447752] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The goal of this study was to explore whether viewing the speaker’s articulatory gestures contributes to lexical access in children (ages 5–10) and in adults. We conducted a vowel monitoring task with words and pseudo-words in audio-only (AO) and audiovisual (AV) contexts with white noise masking the acoustic signal. The results indicated that children clearly benefited from visual speech from age 6–7 onwards. However, unlike adults, the word superiority effect was not greater in the AV than the AO condition in children, suggesting that visual speech mostly contributes to phonemic—rather than lexical—processing during childhood, at least until the age of 10.
Collapse
Affiliation(s)
- Mathilde Fort
- Laboratoire de Psychologie et de NeuroCognition, Grenoble, France
| | - Elsa Spinelli
- Laboratoire de Psychologie et de NeuroCognition, Grenoble, France
- University of California, Berkeley, USA
| | | | - Sonia Kandel
- Laboratoire de Psychologie et de NeuroCognition, Grenoble, France
| |
Collapse
|
28
|
Wild CJ, Davis MH, Johnsrude IS. Human auditory cortex is sensitive to the perceived clarity of speech. Neuroimage 2012; 60:1490-502. [PMID: 22248574 DOI: 10.1016/j.neuroimage.2012.01.035] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 11/23/2011] [Accepted: 01/02/2012] [Indexed: 11/29/2022] Open
Affiliation(s)
- Conor J Wild
- Centre for Neuroscience Studies, Queen's University, Kingston ON, Canada.
| | | | | |
Collapse
|
29
|
Recording Event-Related Brain Potentials: Application to Study Auditory Perception. THE HUMAN AUDITORY CORTEX 2012. [DOI: 10.1007/978-1-4614-2314-0_4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
30
|
Abstract
Auditory signals are decomposed into discrete frequency elements early in the transduction process, yet somehow these signals are recombined into the rich acoustic percepts that we readily identify and are familiar with. The cerebral cortex is necessary for the perception of these signals, and studies from several laboratories over the past decade have made significant advances in our understanding of the neuronal mechanisms underlying auditory perception. This review will concentrate on recent studies in the macaque monkey that indicate that the activity of populations of neurons better accounts for the perceptual abilities compared to the activity of single neurons. The best examples address whether the acoustic space is represented along the "where" pathway in the caudal regions of auditory cortex. Our current understanding of how such population activity could also underlie the perception of the nonspatial features of acoustic stimuli is reviewed, as is how multisensory interactions can influence our auditory perception.
Collapse
Affiliation(s)
- Gregg H Recanzone
- Center for Neuroscience and Department of Neurobiology, Physiology and Behavior, University of California, Davis, California
| |
Collapse
|
31
|
Kim J, Sironic A, Davis C. Hearing Speech in Noise: Seeing a Loud Talker is Better. Perception 2011; 40:853-62. [DOI: 10.1068/p6941] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Seeing the talker improves the intelligibility of speech degraded by noise (a visual speech benefit). Given that talkers exaggerate spoken articulation in noise, this set of two experiments examined whether the visual speech benefit was greater for speech produced in noise than in quiet. We first examined the extent to which spoken articulation was exaggerated in noise by measuring the motion of face markers as four people uttered 10 sentences either in quiet or in babble-speech noise (these renditions were also filmed). The tracking results showed that articulated motion in speech produced in noise was greater than that produced in quiet and was more highly correlated with speech acoustics. Speech intelligibility was tested in a second experiment using a speech-perception-in-noise task under auditory-visual and auditory-only conditions. The results showed that the visual speech benefit was greater for speech recorded in noise than for speech recorded in quiet. Furthermore, the amount of articulatory movement was related to performance on the perception task, indicating that the enhanced gestures made when speaking in noise function to make speech more intelligible.
Collapse
Affiliation(s)
| | - Amanda Sironic
- Department of Psychology, The University of Melbourne, Australia
| | | |
Collapse
|
32
|
Sato M, Cavé C, Ménard L, Brasseur A. Auditory-tactile speech perception in congenitally blind and sighted adults. Neuropsychologia 2010; 48:3683-6. [DOI: 10.1016/j.neuropsychologia.2010.08.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Revised: 06/28/2010] [Accepted: 08/16/2010] [Indexed: 10/19/2022]
|
33
|
Strelnikov K, Rouger J, Barone P, Deguine O. Role of speechreading in audiovisual interactions during the recovery of speech comprehension in deaf adults with cochlear implants. Scand J Psychol 2010; 50:437-44. [PMID: 19778391 DOI: 10.1111/j.1467-9450.2009.00741.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Speechreading is an important form of communicative activity that improves social adaptation in deaf adults. Cochlear implantation allows interaction between the visual speechreading abilities developed during deafness and the auditory sensory experiences acquired through use of the cochlear implant. Crude auditory information provided by the implant is analyzed in parallel with conjectural information from speechreading, thus creating new profiles of audiovisual integration with implications for brain plasticity. Understanding the peculiarities of change in speechreading after cochlear implantation may improve our understanding of brain plasticity and provide useful information for functional rehabilitation of implanted patients. In this article, we present a generalized review of our recent studies and indicate perspectives for further research in this domain.
Collapse
Affiliation(s)
- K Strelnikov
- Université Toulouse, CerCo, Université Paul Sabatier, France
| | | | | | | |
Collapse
|
34
|
Bishop CW, Miller LM. A multisensory cortical network for understanding speech in noise. J Cogn Neurosci 2009; 21:1790-805. [PMID: 18823249 DOI: 10.1162/jocn.2009.21118] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In noisy environments, listeners tend to hear a speaker's voice yet struggle to understand what is said. The most effective way to improve intelligibility in such conditions is to watch the speaker's mouth movements. Here we identify the neural networks that distinguish understanding from merely hearing speech, and determine how the brain applies visual information to improve intelligibility. Using functional magnetic resonance imaging, we show that understanding speech-in-noise is supported by a network of brain areas including the left superior parietal lobule, the motor/premotor cortex, and the left anterior superior temporal sulcus (STS), a likely apex of the acoustic processing hierarchy. Multisensory integration likely improves comprehension through improved communication between the left temporal-occipital boundary, the left medial-temporal lobe, and the left STS. This demonstrates how the brain uses information from multiple modalities to improve speech comprehension in naturalistic, acoustically adverse conditions.
Collapse
|
35
|
Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz JL, Jutten C. A study of lip movements during spontaneous dialog and its application to voice activity detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1184-1196. [PMID: 19206891 DOI: 10.1121/1.3050257] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., speech+nonspeech audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.
Collapse
Affiliation(s)
- David Sodoyer
- Department of Speech and Cognition, GIPSA-lab, UMR 5126 CNRS, Grenoble-INP, Université Stendhal, Université Joseph Fourier, Grenoble, France
| | | | | | | | | | | |
Collapse
|
36
|
Allen K, Carlile S, Alais D. Contributions of talker characteristics and spatial location to auditory streaming. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:1562-1570. [PMID: 18345844 DOI: 10.1121/1.2831774] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
To examine whether auditory streaming contributes to unmasking, intelligibility of target sentences against two competing talkers was measured using the coordinate response measure (CRM) [Bolia et al., J. Acoust. Soc. Am. 107, 1065-1066 (2007)] corpus. In the control condition, the speech reception threshold (50% correct) was measured when the target and two maskers were collocated straight ahead. Separating maskers from the target by +/-30 degrees resulted in spatial release from masking of 12 dB. CRM sentences involve an identifier in the first part and two target words in the second part. In experimental conditions, masking talkers started spatially separated at +/-30 degrees but became collocated with the target before the scoring words. In one experiment, one target and two different maskers were randomly selected from a mixed-sex corpus. Significant unmasking of 4 dB remained despite the absence of persistent location cues. When same-sex talkers were used as maskers and target, unmasking was reduced. These data suggest that initial separation may permit confident identification and streaming of the target and masker speech where significant differences between target and masker voice characteristics exist, but where target and masker characteristics are similar, listeners must rely more heavily on continuing spatial cues.
Collapse
Affiliation(s)
- Kachina Allen
- Department of Physiology, University of Sydney, Sydney, NSW 2106, Australia.
| | | | | |
Collapse
|
37
|
McGurk effects in cochlear-implanted deaf subjects. Brain Res 2008; 1188:87-99. [PMID: 18062941 DOI: 10.1016/j.brainres.2007.10.049] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2007] [Revised: 08/28/2007] [Accepted: 10/10/2007] [Indexed: 11/22/2022]
|
38
|
|
39
|
Miller LM, D'Esposito M. Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci 2006; 25:5884-93. [PMID: 15976077 PMCID: PMC6724802 DOI: 10.1523/jneurosci.0896-05.2005] [Citation(s) in RCA: 266] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Human speech perception is profoundly influenced by vision. Watching a speaker's mouth movements significantly improves comprehension, both for normal listeners in noisy environments and especially for the hearing impaired. A number of brain regions have been implicated in audiovisual speech tasks, but little evidence distinguishes them functionally. In an event-related functional magnetic resonance imaging study, we differentiate neural systems that evaluate cross-modal coincidence of the physical stimuli from those that mediate perceptual binding. Regions consistently involved in perceptual fusion per se included Heschl's gyrus, superior temporal sulcus, middle intraparietal sulcus, and inferior frontal gyrus. Successful fusion elicited activity biased toward the left hemisphere, although failed cross-modal binding recruited regions in both hemispheres. A broad network of other areas, including the superior colliculus, anterior insula, and anterior intraparietal sulcus, were more involved with evaluating the spatiotemporal correspondence of speech stimuli, regardless of a subject's perception. All of these showed greater activity to temporally offset stimuli than to audiovisually synchronous stimuli. Our results demonstrate how elements of the cross-modal speech integration network differ in their sensitivity to physical reality versus perceptual experience.
Collapse
Affiliation(s)
- Lee M Miller
- Section of Neurobiology, Physiology, and Behavior and Center for Mind and Brain, University of California, Davis, California 95616, USA
| | | |
Collapse
|
40
|
Andersen TS, Tiippana K, Sams M. Factors influencing audiovisual fission and fusion illusions. ACTA ACUST UNITED AC 2004; 21:301-8. [PMID: 15511646 DOI: 10.1016/j.cogbrainres.2004.06.004] [Citation(s) in RCA: 133] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/10/2004] [Indexed: 10/25/2022]
Abstract
Information processing in auditory and visual modalities interacts in many circumstances. Spatially and temporally coincident acoustic and visual information are often bound together to form multisensory percepts [B.E. Stein, M.A. Meredith, The Merging of the Senses, A Bradford Book, Cambridge, MA, (1993), 211 pp.; Psychol. Bull. 88 (1980) 638]. Shams et al. recently reported a multisensory fission illusion where a single flash is perceived as two flashes when two rapid tone beeps are presented concurrently [Nature 408 (2000) 788; Cogn. Brain Res. 14 (2002) 147]. The absence of a fusion illusion, where two flashes would fuse to one when accompanied by one beep, indicated a perceptual rather than cognitive nature of the illusion. Here we report both fusion and fission illusions using stimuli very similar to those used by Shams et al. By instructing subjects to count beeps rather than flashes and decreasing the sound intensity to near threshold, we also created a corresponding visually induced auditory illusion. We discuss our results in light of four hypotheses of multisensory integration, each advocating a condition for modality dominance. According to the discontinuity hypothesis [Cogn. Brain Res. 14 (2002) 147], the modality in which stimulation is discontinuous dominates. The modality appropriateness hypothesis [Psychol. Bull. 88 (1980) 638] states that the modality more appropriate for the task at hand dominates. The information reliability hypothesis [J.-L. Schwartz, J. Robert-Ribes, P. Escudier, Ten years after Summerfield: a taxonomy of models for audio-visual fusion in speech perception. In: R. Campbell (Ed.), Hearing by Eye: The Psychology of Lipreading, Lawrence Earlbaum Associates, Hove, UK, (1998), pp. 3-51] claims that the modality providing more reliable information dominates. In strong forms, none of these three hypotheses applies to our data. We re-state the hypotheses in weak forms so that discontinuity, modality appropriateness and information reliability are factors which increase a modality's tendency to dominate. All these factors are important in explaining our data. Finally, we interpret the effect of instructions in light of the directed attention hypothesis which states that the attended modality is dominant [Psychol. Bull. 88 (1980) 638].
Collapse
Affiliation(s)
- Tobias S Andersen
- Laboratory of Computational Engineering, Helsinki University of Technology, P.O. Box 3000, Espoo 02015 HUT, Finland.
| | | | | |
Collapse
|
41
|
Leung SH, Wang SL, Lau WH. Lip image segmentation using fuzzy clustering incorporating an elliptic shape function. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2004; 13:51-62. [PMID: 15376957 DOI: 10.1109/tip.2003.818116] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Recently, lip image analysis has received much attention because its visual information is shown to provide improvement for speech recognition and speaker authentication. Lip image segmentation plays an important role in lip image analysis. In this paper, a new fuzzy clustering method for lip image segmentation is presented. This clustering method takes both the color information and the spatial distance into account while most of the current clustering methods only deal with the former. In this method, a new dissimilarity measure, which integrates the color dissimilarity and the spatial distance in terms of an elliptic shape function, is introduced. Because of the presence of the elliptic shape function, the new measure is able to differentiate the pixels having similar color information but located in different regions. A new iterative algorithm for the determination of the membership and centroid for each class is derived, which is shown to provide good differentiation between the lip region and the nonlip region. Experimental results show that the new algorithm yields better membership distribution and lip shape than the standard fuzzy c-means algorithm and four other methods investigated in the paper.
Collapse
Affiliation(s)
- Shu-Hung Leung
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| | | | | |
Collapse
|
42
|
Girin L, Schwartz JL, Feng G. Audio-visual enhancement of speech in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 109:3007-3020. [PMID: 11425143 DOI: 10.1121/1.1358887] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach--that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise.
Collapse
Affiliation(s)
- L Girin
- Institut de la Communication Parlée, INPG/Université Stendhal/CNRS UMR 5009, Grenoble, France.
| | | | | |
Collapse
|
43
|
Robert-Ribes J, Schwartz JL, Lallouache T, Escudier P. Complementarity and synergy in bimodal speech: auditory, visual, and audio-visual identification of French oral vowels in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 1998; 103:3677-3689. [PMID: 9637049 DOI: 10.1121/1.423069] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The efficacy of audio-visual interactions in speech perception comes from two kinds of factors. First, at the information level, there is some "complementarity" of audition and vision: It seems that some speech features, mainly concerned with manner of articulation, are best transmitted by the audio channel, while some other features, mostly describing place of articulation, are best transmitted by the video channel. Second, at the information processing level, there is some "synergy" between audition and vision: The audio-visual global identification scores in a number of different tasks involving acoustic noise are generally greater than both the auditory-alone and the visual-alone scores. However, these two properties have been generally demonstrated until now in rather global terms. In the present work, audio-visual interactions at the feature level are studied for French oral vowels which contrast three series, namely front unrounded, front rounded, and back rounded vowels. A set of experiments on the auditory, visual, and audio-visual identification of vowels embedded in various amounts of noise demonstrate that complementarity and synergy in bimodal speech appear to hold for a bundle of individual phonetic features describing place contrasts in oral vowels. At the information level (complementarity), in the audio channel the height feature is the most robust, backness the second most robust one, and rounding the least, while in the video channel rounding is better than height, and backness is almost invisible. At the information processing (synergy) level, transmitted information scores show that all individual features are better transmitted with the ear and the eye together than with each sensor individually.
Collapse
Affiliation(s)
- J Robert-Ribes
- Institut de la Communication Parlée CNRS UPRESA, Grenoble, France.
| | | | | | | |
Collapse
|