1
|
Ito T, Ogane R. Repetitive Exposure to Orofacial Somatosensory Inputs in Speech Perceptual Training Modulates Vowel Categorization in Speech Perception. Front Psychol 2022; 13:839087. [PMID: 35558689 PMCID: PMC9088678 DOI: 10.3389/fpsyg.2022.839087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 03/24/2022] [Indexed: 11/24/2022] Open
Abstract
Orofacial somatosensory inputs may play a role in the link between speech perception and production. Given the fact that speech motor learning, which involves paired auditory and somatosensory inputs, results in changes to speech perceptual representations, somatosensory inputs may also be involved in learning or adaptive processes of speech perception. Here we show that repetitive pairing of somatosensory inputs and sounds, such as occurs during speech production and motor learning, can also induce a change of speech perception. We examined whether the category boundary between /ε/ and /a/ was changed as a result of perceptual training with orofacial somatosensory inputs. The experiment consisted of three phases: Baseline, Training, and Aftereffect. In all phases, a vowel identification test was used to identify the perceptual boundary between /ε/ and /a/. In the Baseline and the Aftereffect phase, an adaptive method based on the maximum-likelihood procedure was applied to detect the category boundary using a small number of trials. In the Training phase, we used the method of constant stimuli in order to expose participants to stimulus variants which covered the range between /ε/ and /a/ evenly. In this phase, to mimic the sensory input that accompanies speech production and learning in an experimental group, somatosensory stimulation was applied in the upward direction when the stimulus sound was presented. A control group (CTL) followed the same training procedure in the absence of somatosensory stimulation. When we compared category boundaries prior to and following paired auditory-somatosensory training, the boundary for participants in the experimental group reliably changed in the direction of /ε/, indicating that the participants perceived /a/ more than /ε/ as a consequence of training. In contrast, the CTL did not show any change. Although a limited number of participants were tested, the perceptual shift was reduced and almost eliminated 1 week later. Our data suggest that repetitive exposure of somatosensory inputs in a task that simulates the sensory pairing which occurs during speech production, changes perceptual system and supports the idea that somatosensory inputs play a role in speech perceptual adaptation, probably contributing to the formation of sound representations for speech perception.
Collapse
Affiliation(s)
- Takayuki Ito
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
- Haskins Laboratories, New Haven, CT, United States
| | - Rintaro Ogane
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
- Haskins Laboratories, New Haven, CT, United States
| |
Collapse
|
2
|
Olasagasti I, Giraud AL. Integrating prediction errors at two time scales permits rapid recalibration of speech sound categories. eLife 2020; 9:44516. [PMID: 32223894 PMCID: PMC7217692 DOI: 10.7554/elife.44516] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/17/2020] [Indexed: 01/01/2023] Open
Abstract
Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations. People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound. Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales. Over and above slow “procedural” learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.
Collapse
Affiliation(s)
- Itsaso Olasagasti
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
3
|
Attentional resources contribute to the perceptual learning of talker idiosyncrasies in audiovisual speech. Atten Percept Psychophys 2019; 81:1006-1019. [PMID: 30684204 DOI: 10.3758/s13414-018-01651-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
To recognize audiovisual speech, listeners evaluate and combine information obtained from the auditory and visual modalities. Listeners also use information from one modality to adjust their phonetic categories to a talker's idiosyncrasy encountered in the other modality. In this study, we examined whether the outcome of this cross-modal recalibration relies on attentional resources. In a standard recalibration experiment in Experiment 1, participants heard an ambiguous sound, disambiguated by the accompanying visual speech as either /p/ or /t/. Participants' primary task was to attend to the audiovisual speech while either monitoring a tone sequence for a target tone or ignoring the tones. Listeners subsequently categorized the steps of an auditory /p/-/t/ continuum more often in line with their exposure. The aftereffect of phonetic recalibration was reduced, but not eliminated, by attentional load during exposure. In Experiment 2, participants saw an ambiguous visual speech gesture that was disambiguated auditorily as either /p/ or /t/. At test, listeners categorized the steps of a visual /p/-/t/ continuum more often in line with the prior exposure. Imposing load in the auditory modality during exposure did not reduce the aftereffect of this type of cross-modal phonetic recalibration. Together, these results suggest that auditory attentional resources are needed for the processing of auditory speech and/or for the shifting of auditory phonetic category boundaries. Listeners thus need to dedicate attentional resources in order to accommodate talker idiosyncrasies in audiovisual speech.
Collapse
|
4
|
Noppeney U, Lee HL. Causal inference and temporal predictions in audiovisual perception of speech and music. Ann N Y Acad Sci 2018; 1423:102-116. [PMID: 29604082 DOI: 10.1111/nyas.13615] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 12/13/2017] [Accepted: 12/22/2017] [Indexed: 11/28/2022]
Abstract
To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses.
Collapse
Affiliation(s)
- Uta Noppeney
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, UK
| | - Hwee Ling Lee
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| |
Collapse
|
5
|
Abstract
Multisensory integration can play a critical role in producing unified and reliable perceptual experience. When sensory information in one modality is degraded or ambiguous, information from other senses can crossmodally resolve perceptual ambiguities. Prior research suggests that auditory information can disambiguate the contents of visual awareness by facilitating perception of intermodally consistent stimuli. However, it is unclear whether these effects are truly due to crossmodal facilitation or are mediated by voluntary selective attention to audiovisually congruent stimuli. Here, we demonstrate that sounds can bias competition in binocular rivalry toward audiovisually congruent percepts, even when participants have no recognition of the congruency. When speech sounds were presented in synchrony with speech-like deformations of rivalling ellipses, ellipses with crossmodally congruent deformations were perceptually dominant over those with incongruent deformations. This effect was observed in participants who could not identify the crossmodal congruency in an open-ended interview (Experiment 1) or detect it in a simple 2AFC task (Experiment 2), suggesting that the effect was not due to voluntary selective attention or response bias. These results suggest that sound can automatically disambiguate the contents of visual awareness by facilitating perception of audiovisually congruent stimuli.
Collapse
|
6
|
Stevenson RA, Baum SH, Krueger J, Newhouse PA, Wallace MT. Links between temporal acuity and multisensory integration across life span. J Exp Psychol Hum Percept Perform 2017; 44:106-116. [PMID: 28447850 DOI: 10.1037/xhp0000424] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The temporal relationship between individual pieces of information from the different sensory modalities is one of the stronger cues to integrate such information into a unified perceptual gestalt, conveying numerous perceptual and behavioral advantages. Temporal acuity, however, varies greatly over the life span. It has previously been hypothesized that changes in temporal acuity in both development and healthy aging may thus play a key role in integrative abilities. This study tested the temporal acuity of 138 individuals ranging in age from 5 to 80. Temporal acuity and multisensory integration abilities were tested both within and across modalities (audition and vision) with simultaneity judgment and temporal order judgment tasks. We observed that temporal acuity, both within and across modalities, improved throughout development into adulthood and subsequently declined with healthy aging, as did the ability to integrate multisensory speech information. Of importance, throughout development, temporal acuity of simple stimuli (i.e., flashes and beeps) predicted individuals' abilities to integrate more complex speech information. However, in the aging population, although temporal acuity declined with healthy aging and was accompanied by declines in integrative abilities, temporal acuity was not able to predict integration at the individual level. Together, these results suggest that the impact of temporal acuity on multisensory integration varies throughout the life span. Although the maturation of temporal acuity drives the rise of multisensory integrative abilities during development, it is unable to account for changes in integrative abilities in healthy aging. The differential relationships between age, temporal acuity, and multisensory integration suggest an important role for experience in these processes. (PsycINFO Database Record
Collapse
Affiliation(s)
- Ryan A Stevenson
- Department of Psychology, Brain and Mind Institute, University of Western Ontario
| | - Sarah H Baum
- Department of Psychology, University of Washington
| | | | - Paul A Newhouse
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center
| | | |
Collapse
|
7
|
Baart M, Armstrong BC, Martin CD, Frost R, Carreiras M. Cross-modal noise compensation in audiovisual words. Sci Rep 2017; 7:42055. [PMID: 28169316 PMCID: PMC5294401 DOI: 10.1038/srep42055] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 01/06/2017] [Indexed: 11/09/2022] Open
Abstract
Perceiving linguistic input is vital for human functioning, but the process is complicated by the fact that the incoming signal is often degraded. However, humans can compensate for unimodal noise by relying on simultaneous sensory input from another modality. Here, we investigated noise-compensation for spoken and printed words in two experiments. In the first behavioral experiment, we observed that accuracy was modulated by reaction time, bias and sensitivity, but noise compensation could nevertheless be explained via accuracy differences when controlling for RT, bias and sensitivity. In the second experiment, we also measured Event Related Potentials (ERPs) and observed robust electrophysiological correlates of noise compensation starting at around 350 ms after stimulus onset, indicating that noise compensation is most prominent at lexical/semantic processing levels.
Collapse
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Blair C Armstrong
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology &Centre for French &Linguistics at Scarborough, University of Toronto, Toronto, Canada
| | - Clara D Martin
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain
| | - Ram Frost
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel.,Haskins Laboratories, New Haven, CT, USA
| | - Manuel Carreiras
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain.,University of the Basque Country. UPV/EHU, Bilbao, Spain
| |
Collapse
|
8
|
Dias JW, Cook TC, Rosenblum LD. Influences of selective adaptation on perception of audiovisual speech. JOURNAL OF PHONETICS 2016; 56:75-84. [PMID: 27041781 PMCID: PMC4815035 DOI: 10.1016/j.wocn.2016.02.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers "heard" the audio-visual stimulus as an integrated "va" percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual "va" percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation.
Collapse
|
9
|
van der Zande P, Jesse A, Cutler A. Lexically guided retuning of visual phonetic categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:562-571. [PMID: 23862831 DOI: 10.1121/1.4807814] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Listeners retune the boundaries between phonetic categories to adjust to individual speakers' productions. Lexical information, for example, indicates what an unusual sound is supposed to be, and boundary retuning then enables the speaker's sound to be included in the appropriate auditory phonetic category. In this study, it was investigated whether lexical knowledge that is known to guide the retuning of auditory phonetic categories, can also retune visual phonetic categories. In Experiment 1, exposure to a visual idiosyncrasy in ambiguous audiovisually presented target words in a lexical decision task indeed resulted in retuning of the visual category boundary based on the disambiguating lexical context. In Experiment 2 it was tested whether lexical information retunes visual categories directly, or indirectly through the generalization from retuned auditory phonetic categories. Here, participants were exposed to auditory-only versions of the same ambiguous target words as in Experiment 1. Auditory phonetic categories were retuned by lexical knowledge, but no shifts were observed for the visual phonetic categories. Lexical knowledge can therefore guide retuning of visual phonetic categories, but lexically guided retuning of auditory phonetic categories is not generalized to visual categories. Rather, listeners adjust auditory and visual phonetic categories to talker idiosyncrasies separately.
Collapse
Affiliation(s)
- Patrick van der Zande
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 A.H. Nijmegen, The Netherlands.
| | | | | |
Collapse
|
10
|
Beer AL, Plank T, Meyer G, Greenlee MW. Combined diffusion-weighted and functional magnetic resonance imaging reveals a temporal-occipital network involved in auditory-visual object processing. Front Integr Neurosci 2013; 7:5. [PMID: 23407860 PMCID: PMC3570774 DOI: 10.3389/fnint.2013.00005] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 01/25/2013] [Indexed: 11/22/2022] Open
Abstract
Functional magnetic resonance imaging (MRI) showed that the superior temporal and occipital cortex are involved in multisensory integration. Probabilistic fiber tracking based on diffusion-weighted MRI suggests that multisensory processing is supported by white matter connections between auditory cortex and the temporal and occipital lobe. Here, we present a combined functional MRI and probabilistic fiber tracking study that reveals multisensory processing mechanisms that remained undetected by either technique alone. Ten healthy participants passively observed visually presented lip or body movements, heard speech or body action sounds, or were exposed to a combination of both. Bimodal stimulation engaged a temporal-occipital brain network including the multisensory superior temporal sulcus (msSTS), the lateral superior temporal gyrus (lSTG), and the extrastriate body area (EBA). A region-of-interest (ROI) analysis showed multisensory interactions (e.g., subadditive responses to bimodal compared to unimodal stimuli) in the msSTS, the lSTG, and the EBA region. Moreover, sounds elicited responses in the medial occipital cortex. Probabilistic tracking revealed white matter tracts between the auditory cortex and the medial occipital cortex, the inferior occipital cortex (IOC), and the superior temporal sulcus (STS). However, STS terminations of auditory cortex tracts showed limited overlap with the msSTS region. Instead, msSTS was connected to primary sensory regions via intermediate nodes in the temporal and occipital cortex. Similarly, the lSTG and EBA regions showed limited direct white matter connections but instead were connected via intermediate nodes. Our results suggest that multisensory processing in the STS is mediated by separate brain areas that form a distinct network in the lateral temporal and inferior occipital cortex.
Collapse
Affiliation(s)
- Anton L. Beer
- Institut für Psychologie, Universität RegensburgRegensburg, Germany
- Experimental and Clinical Neurosciences Programme, Universität RegensburgRegensburg, Germany
| | - Tina Plank
- Institut für Psychologie, Universität RegensburgRegensburg, Germany
| | - Georg Meyer
- Department of Experimental Psychology, University of LiverpoolLiverpool, UK
| | - Mark W. Greenlee
- Institut für Psychologie, Universität RegensburgRegensburg, Germany
- Experimental and Clinical Neurosciences Programme, Universität RegensburgRegensburg, Germany
| |
Collapse
|
11
|
Blank H, von Kriegstein K. Mechanisms of enhancing visual-speech recognition by prior auditory information. Neuroimage 2012; 65:109-18. [PMID: 23023154 DOI: 10.1016/j.neuroimage.2012.09.047] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Revised: 09/12/2012] [Accepted: 09/20/2012] [Indexed: 11/17/2022] Open
Abstract
Speech recognition from visual-only faces is difficult, but can be improved by prior information about what is said. Here, we investigated how the human brain uses prior information from auditory speech to improve visual-speech recognition. In a functional magnetic resonance imaging study, participants performed a visual-speech recognition task, indicating whether the word spoken in visual-only videos matched the preceding auditory-only speech, and a control task (face-identity recognition) containing exactly the same stimuli. We localized a visual-speech processing network by contrasting activity during visual-speech recognition with the control task. Within this network, the left posterior superior temporal sulcus (STS) showed increased activity and interacted with auditory-speech areas if prior information from auditory speech did not match the visual speech. This mismatch-related activity and the functional connectivity to auditory-speech areas were specific for speech, i.e., they were not present in the control task. The mismatch-related activity correlated positively with performance, indicating that posterior STS was behaviorally relevant for visual-speech recognition. In line with predictive coding frameworks, these findings suggest that prediction error signals are produced if visually presented speech does not match the prediction from preceding auditory speech, and that this mechanism plays a role in optimizing visual-speech recognition by prior information.
Collapse
Affiliation(s)
- Helen Blank
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr 1A, 04103 Leipzig, Germany.
| | | |
Collapse
|
12
|
Lansford KL, Liss JM, Caviness JN, Utianski RL. A cognitive-perceptual approach to conceptualizing speech intelligibility deficits and remediation practice in hypokinetic dysarthria. PARKINSONS DISEASE 2011; 2011:150962. [PMID: 21918728 PMCID: PMC3171761 DOI: 10.4061/2011/150962] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/09/2011] [Revised: 06/14/2011] [Accepted: 07/13/2011] [Indexed: 11/20/2022]
Abstract
Hypokinetic dysarthria is a common manifestation of Parkinson's disease, which negatively influences quality of life. Behavioral techniques that aim to improve speech intelligibility constitute the bulk of intervention strategies for this population, as the dysarthria does not often respond vigorously to medical interventions. Although several case and group studies generally support the efficacy of behavioral treatment, much work remains to establish a rigorous evidence base. This absence of definitive research leaves both the speech-language pathologist and referring physician with the task of determining the feasibility and nature of therapy for intelligibility remediation in PD. The purpose of this paper is to introduce a novel framework for medical practitioners in which to conceptualize and justify potential targets for speech remediation. The most commonly targeted deficits (e.g., speaking rate and vocal loudness) can be supported by this approach, as well as underutilized and novel treatment targets that aim at the listener's perceptual skills.
Collapse
Affiliation(s)
- Kaitlin L Lansford
- Motor Speech Disorders Laboratory, Department of Speech and Hearing Science, Arizona State University, P.O. Box 870102, Tempe, AZ 85287-0102, USA
| | | | | | | |
Collapse
|
13
|
|
14
|
|