1
|
Pouw W, Dixon JA. What you hear and see specifies the perception of a limb-respiratory-vocal act. Proc Biol Sci 2022; 289:20221026. [PMID: 35855599 PMCID: PMC9297030 DOI: 10.1098/rspb.2022.1026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Wim Pouw
- Donders Institute for Brain Cognition and Behaviour, Radboud University, Nijmegen, Gelderland, The Netherlands,Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
| | - James A. Dixon
- Department of Psychology, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
2
|
Patri JF, Ostry DJ, Diard J, Schwartz JL, Trudeau-Fisette P, Savariaux C, Perrier P. Speakers are able to categorize vowels based on tongue somatosensation. Proc Natl Acad Sci U S A 2020; 117:6255-63. [PMID: 32123070 DOI: 10.1073/pnas.1911142117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Auditory speech perception enables listeners to access phonological categories from speech sounds. During speech production and speech motor learning, speakers' experience matched auditory and somatosensory input. Accordingly, access to phonetic units might also be provided by somatosensory information. The present study assessed whether humans can identify vowels using somatosensory feedback, without auditory feedback. A tongue-positioning task was used in which participants were required to achieve different tongue postures within the /e, ε, a/ articulatory range, in a procedure that was totally nonspeech like, involving distorted visual feedback of tongue shape. Tongue postures were measured using electromagnetic articulography. At the end of each tongue-positioning trial, subjects were required to whisper the corresponding vocal tract configuration with masked auditory feedback and to identify the vowel associated with the reached tongue posture. Masked auditory feedback ensured that vowel categorization was based on somatosensory feedback rather than auditory feedback. A separate group of subjects was required to auditorily classify the whispered sounds. In addition, we modeled the link between vowel categories and tongue postures in normal speech production with a Bayesian classifier based on the tongue postures recorded from the same speakers for several repetitions of the /e, ε, a/ vowels during a separate speech production task. Overall, our results indicate that vowel categorization is possible with somatosensory feedback alone, with an accuracy that is similar to the accuracy of the auditory perception of whispered sounds, and in congruence with normal speech articulation, as accounted for by the Bayesian classifier.
Collapse
|
3
|
Rizza A, Terekhov AV, Montone G, Olivetti-Belardinelli M, O'Regan JK. Why Early Tactile Speech Aids May Have Failed: No Perceptual Integration of Tactile and Auditory Signals. Front Psychol 2018; 9:767. [PMID: 29875719 PMCID: PMC5974558 DOI: 10.3389/fpsyg.2018.00767] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 04/30/2018] [Indexed: 11/23/2022] Open
Abstract
Tactile speech aids, though extensively studied in the 1980’s and 1990’s, never became a commercial success. A hypothesis to explain this failure might be that it is difficult to obtain true perceptual integration of a tactile signal with information from auditory speech: exploitation of tactile cues from a tactile aid might require cognitive effort and so prevent speech understanding at the high rates typical of everyday speech. To test this hypothesis, we attempted to create true perceptual integration of tactile with auditory information in what might be considered the simplest situation encountered by a hearing-impaired listener. We created an auditory continuum between the syllables /BA/ and /VA/, and trained participants to associate /BA/ to one tactile stimulus and /VA/ to another tactile stimulus. After training, we tested if auditory discrimination along the continuum between the two syllables could be biased by incongruent tactile stimulation. We found that such a bias occurred only when the tactile stimulus was above, but not when it was below its previously measured tactile discrimination threshold. Such a pattern is compatible with the idea that the effect is due to a cognitive or decisional strategy, rather than to truly perceptual integration. We therefore ran a further study (Experiment 2), where we created a tactile version of the McGurk effect. We extensively trained two Subjects over 6 days to associate four recorded auditory syllables with four corresponding apparent motion tactile patterns. In a subsequent test, we presented stimulation that was either congruent or incongruent with the learnt association, and asked Subjects to report the syllable they perceived. We found no analog to the McGurk effect, suggesting that the tactile stimulation was not being perceptually integrated with the auditory syllable. These findings strengthen our hypothesis according to which tactile aids failed because integration of tactile cues with auditory speech occurred at a cognitive or decisional level, rather than truly at a perceptual level.
Collapse
Affiliation(s)
- Aurora Rizza
- Department of Psychology, Faculty of Medicine and Psychology, Sapienza University of Rome, Rome, Italy
| | - Alexander V Terekhov
- Laboratoire Psychologie de la Perception, Université Paris Descartes, Paris, France
| | - Guglielmo Montone
- Laboratoire Psychologie de la Perception, Université Paris Descartes, Paris, France
| | - Marta Olivetti-Belardinelli
- Department of Psychology, Faculty of Medicine and Psychology, Sapienza University of Rome, Rome, Italy.,ECONA Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial Systems, Rome, Italy
| | - J Kevin O'Regan
- Laboratoire Psychologie de la Perception, Université Paris Descartes, Paris, France
| |
Collapse
|
4
|
Havenhill J, Do Y. Visual Speech Perception Cues Constrain Patterns of Articulatory Variation and Sound Change. Front Psychol 2018; 9:728. [PMID: 29867686 PMCID: PMC5962885 DOI: 10.3389/fpsyg.2018.00728] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 04/25/2018] [Indexed: 11/20/2022] Open
Abstract
What are the factors that contribute to (or inhibit) diachronic sound change? While acoustically motivated sound changes are well-documented, research on the articulatory and audiovisual-perceptual aspects of sound change is limited. This paper investigates the interaction of articulatory variation and audiovisual speech perception in the Northern Cities Vowel Shift (NCVS), a pattern of sound change observed in the Great Lakes region of the United States. We focus specifically on the maintenance of the contrast between the vowels /ɑ/ and /ɔ/, both of which are fronted as a result of the NCVS. We present results from two experiments designed to test how the NCVS is produced and perceived. In the first experiment, we present data from an articulatory and acoustic analysis of the production of fronted /ɑ/ and /ɔ/. We find that some speakers distinguish /ɔ/ from /ɑ/ with a combination of both tongue position and lip rounding, while others do so using either tongue position or lip rounding alone. For speakers who distinguish /ɔ/ from /ɑ/ along only one articulatory dimension, /ɑ/ and /ɔ/ are acoustically more similar than for speakers who produce multiple articulatory distinctions. While all three groups of speakers maintain some degree of acoustic contrast between the vowels, the question is raised as to whether these articulatory strategies differ in their perceptibility. In the perception experiment, we test the hypothesis that visual speech cues play a role in maintaining contrast between the two sounds. The results of this experiment suggest that articulatory configurations in which /ɔ/ is produced with unround lips are perceptually weaker than those in which /ɔ/ is produced with rounding, even though these configurations result in acoustically similar output. We argue that these findings have implications for theories of sound change and variation in at least two respects: (1) visual cues can shape phonological systems through misperception-based sound change, and (2) phonological systems may be optimized not only for auditory but also for visual perceptibility.
Collapse
Affiliation(s)
- Jonathan Havenhill
- Department of Linguistics, Georgetown University, Washington, DC, United States
| | - Youngah Do
- Department of Linguistics, University of Hong Kong, Hong Kong, Hong Kong
| |
Collapse
|
5
|
Stekelenburg JJ, Keetels M, Vroomen J. Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity. Eur J Neurosci 2018. [PMID: 29537657 PMCID: PMC5969231 DOI: 10.1111/ejn.13908] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech.
Collapse
Affiliation(s)
- Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Mirjam Keetels
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| |
Collapse
|
6
|
Treille A, Vilain C, Kandel S, Sato M. Electrophysiological evidence for a self-processing advantage during audiovisual speech integration. Exp Brain Res 2017; 235:2867-76. [PMID: 28676921 DOI: 10.1007/s00221-017-5018-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 06/23/2017] [Indexed: 10/19/2022]
Abstract
Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Collapse
|
7
|
Baart M, Armstrong BC, Martin CD, Frost R, Carreiras M. Cross-modal noise compensation in audiovisual words. Sci Rep 2017; 7:42055. [PMID: 28169316 PMCID: PMC5294401 DOI: 10.1038/srep42055] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 01/06/2017] [Indexed: 11/09/2022] Open
Abstract
Perceiving linguistic input is vital for human functioning, but the process is complicated by the fact that the incoming signal is often degraded. However, humans can compensate for unimodal noise by relying on simultaneous sensory input from another modality. Here, we investigated noise-compensation for spoken and printed words in two experiments. In the first behavioral experiment, we observed that accuracy was modulated by reaction time, bias and sensitivity, but noise compensation could nevertheless be explained via accuracy differences when controlling for RT, bias and sensitivity. In the second experiment, we also measured Event Related Potentials (ERPs) and observed robust electrophysiological correlates of noise compensation starting at around 350 ms after stimulus onset, indicating that noise compensation is most prominent at lexical/semantic processing levels.
Collapse
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Blair C Armstrong
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology &Centre for French &Linguistics at Scarborough, University of Toronto, Toronto, Canada
| | - Clara D Martin
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain
| | - Ram Frost
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel.,Haskins Laboratories, New Haven, CT, USA
| | - Manuel Carreiras
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain.,IKERBASQUE Basque Foundation for Science, Bilbao, Spain.,University of the Basque Country. UPV/EHU, Bilbao, Spain
| |
Collapse
|
8
|
Abstract
To become language users, infants must embrace the integrality of speech perception and production. That they do so, and quite rapidly, is implied by the native-language attunement they achieve in each domain by 6-12 months. Yet research has most often addressed one or the other domain, rarely how they interrelate. Moreover, mainstream assumptions that perception relies on acoustic patterns whereas production involves motor patterns entail that the infant would have to translate incommensurable information to grasp the perception-production relationship. We posit the more parsimonious view that both domains depend on commensurate articulatory information. Our proposed framework combines principles of the Perceptual Assimilation Model (PAM) and Articulatory Phonology (AP). According to PAM, infants attune to articulatory information in native speech and detect similarities of nonnative phones to native articulatory patterns. The AP premise that gestures of the speech organs are the basic elements of phonology offers articulatory similarity metrics while satisfying the requirement that phonological information be discrete and contrastive: (a) distinct articulatory organs produce vocal tract constrictions and (b) phonological contrasts recruit different articulators and/or constrictions of a given articulator that differ in degree or location. Various lines of research suggest young children perceive articulatory information, which guides their productions: discrimination of between- versus within-organ contrasts, simulations of attunement to language-specific articulatory distributions, multimodal speech perception, oral/vocal imitation, and perceptual effects of articulator activation or suppression. We conclude that articulatory gesture information serves as the foundation for developmental integrality of speech perception and production.
Collapse
Affiliation(s)
- Catherine T. Best
- MARCS Institute, Western Sydney University
- School of Humanities and Communication Arts, Western Sydney University
- Haskins Laboratories
| | - Louis M. Goldstein
- Haskins Laboratories
- Department of Linguistics, University of Southern California
| | - Hosung Nam
- Haskins Laboratories
- Department of English Language and Literature, Korea University
| | - Michael D. Tyler
- MARCS Institute, Western Sydney University
- School of Social Sciences and Psychology, Western Sydney University
| |
Collapse
|
9
|
Remez RE, Rubin PE. PERCEPTUAL ORGANIZATION AND LAWFUL SPECIFICATION. Ecol Psychol 2016; 28:160-165. [PMID: 27642242 DOI: 10.1080/10407413.2016.1195188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
When a listener can also see a talker, audible and visible properties are ineluctably combined, perceptually. This perceptual disposition to audiovisual integration has received widely ranging explanations. At one extreme, accounts have likened perception to a blind listener and a deaf viewer combined within a single skin, resolving discrepancies in identification by each modality. At the other extreme, perception has been described as necessarily and automatically synesthetic. Useful descriptive and explanatory evidence was provided in a study of auditory-haptic presentation by Fowler and Dekle (1991), showing that neither familiarity nor congruence is required for perceptual integration to occur across modalities. Instead, the notion of conjoint lawful specification was proposed as a governing constraint. This principle treats sensory activity as proximal sampling of the properties of distal objects and events, and this essay notes that its corollaries offer a broadly applicable guide in contemporary investigations of perception.
Collapse
Affiliation(s)
- Robert E Remez
- Department of Psychology, Program in Neuroscience and Behavior, Barnard College, Columbia University, 3009 Broadway, New York, New York 10027-6598
| | - Philip E Rubin
- Haskins Laboratories, 300 George Street, New Haven, Connecticut, 06511-6695
| |
Collapse
|
10
|
Bartoli E, Maffongelli L, Campus C, D'Ausilio A. Beta rhythm modulation by speech sounds: somatotopic mapping in somatosensory cortex. Sci Rep 2016; 6:31182. [PMID: 27499204 DOI: 10.1038/srep31182] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 07/13/2016] [Indexed: 11/20/2022] Open
Abstract
During speech listening motor regions are somatotopically activated, resembling the activity that subtends actual speech production, suggesting that motor commands can be retrieved from sensory inputs. Crucially, the efficient motor control of the articulators relies on the accurate anticipation of the somatosensory reafference. Nevertheless, evidence about somatosensory activities elicited by auditory speech processing is sparse. The present work looked for specific interactions between auditory speech presentation and somatosensory cortical information processing. We used an auditory speech identification task with sounds having different place of articulation (bilabials and dentals). We tested whether coupling the auditory task with a peripheral electrical stimulation of the lips would affect the pattern of sensorimotor electroencephalographic rhythms. Peripheral electrical stimulation elicits a series of spectral perturbations of which the beta rebound reflects the return-to-baseline stage of somatosensory processing. We show a left-lateralized and selective reduction in the beta rebound following lip somatosensory stimulation when listening to speech sounds produced with the lips (i.e. bilabials). Thus, the somatosensory processing could not return to baseline due to the recruitment of the same neural resources by speech stimuli. Our results are a clear demonstration that heard speech sounds are somatotopically mapped onto somatosensory cortices, according to place of articulation.
Collapse
|
11
|
McMurray B, Jongman A. What Comes After /f/? Prediction in Speech Derives From Data-Explanatory Processes. Psychol Sci 2015; 27:43-52. [PMID: 26581947 DOI: 10.1177/0956797615609578] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Accepted: 09/10/2015] [Indexed: 11/17/2022] Open
Abstract
Acoustic cues are short-lived and highly variable, which makes speech perception a difficult problem. However, most listeners solve this problem effortlessly. In the present experiment, we demonstrated that part of the solution lies in predicting upcoming speech sounds and that predictions are modulated by high-level expectations about the current sound. Participants heard isolated fricatives (e.g., "s," "sh") and predicted the upcoming vowel. Accuracy was above chance, which suggests that fine-grained detail in the signal can be used for prediction. A second group performed the same task but also saw a still face and a letter corresponding to the fricative. This group performed markedly better, which suggests that high-level knowledge modulates prediction by helping listeners form expectations about what the fricative should have sounded like. This suggests a form of data explanation operating in speech perception: Listeners account for variance due to their knowledge of the talker and current phoneme, and they use what is left over to make more accurate predictions about the next sound.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa
| | | |
Collapse
|
12
|
Ito T, Gracco VL, Ostry DJ. Temporal factors affecting somatosensory-auditory interactions in speech processing. Front Psychol 2014; 5:1198. [PMID: 25452733 PMCID: PMC4233986 DOI: 10.3389/fpsyg.2014.01198] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 10/04/2014] [Indexed: 12/03/2022] Open
Abstract
Speech perception is known to rely on both auditory and visual information. However, sound-specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study, we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory–auditory interaction in speech perception. We examined the changes in event-related potentials (ERPs) in response to multisensory synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the ERP was reliably different from the two unisensory potentials. More importantly, the magnitude of the ERP difference varied as a function of the relative timing of the somatosensory–auditory stimulation. Event-related activity change due to stimulus timing was seen between 160 and 220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory–auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production.
Collapse
Affiliation(s)
| | - Vincent L Gracco
- Haskins Laboratories, New Haven , CT, USA ; McGill University, Montréal , QC, Canada
| | - David J Ostry
- Haskins Laboratories, New Haven , CT, USA ; McGill University, Montréal , QC, Canada
| |
Collapse
|
13
|
Treille A, Vilain C, Sato M. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception. Front Psychol 2014; 5:420. [PMID: 24860533 PMCID: PMC4026678 DOI: 10.3389/fpsyg.2014.00420] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 04/21/2014] [Indexed: 12/03/2022] Open
Abstract
Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.
Collapse
Affiliation(s)
- Avril Treille
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| | - Coriandre Vilain
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| | - Marc Sato
- CNRS, Département Parole and Cognition, Gipsa-Lab, UMR 5216, Grenoble Université Grenoble, France
| |
Collapse
|
14
|
Mochida T, Kimura T, Hiroya S, Kitagawa N, Gomi H, Kondo T. Speech misperception: speaking and seeing interfere differently with hearing. PLoS One 2013; 8:e68619. [PMID: 23844227 PMCID: PMC3701087 DOI: 10.1371/journal.pone.0068619] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 05/30/2013] [Indexed: 11/23/2022] Open
Abstract
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
Collapse
Affiliation(s)
- Takemi Mochida
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan.
| | | | | | | | | | | |
Collapse
|
15
|
Altieri N, Pisoni DB, Townsend JT. Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions. ACTA ACUST UNITED AC 2011; 24:513-39. [PMID: 21968081 DOI: 10.1163/187847611x595864] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.
Collapse
Affiliation(s)
- Nicholas Altieri
- Department of Psychology, University of Oklahoma, OK 73072, USA.
| | | | | |
Collapse
|
16
|
Altieri N, Townsend JT. An assessment of behavioral dynamic information processing measures in audiovisual speech perception. Front Psychol 2011; 2:238. [PMID: 21980314 PMCID: PMC3180170 DOI: 10.3389/fpsyg.2011.00238] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 08/30/2011] [Indexed: 11/21/2022] Open
Abstract
Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.
Collapse
Affiliation(s)
- Nicholas Altieri
- Department of Psychology, The University of Oklahoma Norman, OK, USA
| | | |
Collapse
|
17
|
Abstract
In the present study, we demonstrate an audiotactile effect in which amplitude modulation of auditory feedback during voiced speech induces a throbbing sensation over the lip and laryngeal regions. Control tasks coupled with the examination of speech acoustic parameters allow us to rule out the possibility that the effect may have been due to cognitive factors or motor compensatory effects. We interpret the effect as reflecting the tight interplay between auditory and tactile modalities during vocal production.
Collapse
Affiliation(s)
- François Champoux
- Centre de recherche interdisciplinaire en réadaptation du Montréal métropolitain/Institut Raymond-Dewar, Montreal, Quebec, Canada.
| | | | | |
Collapse
|
18
|
Viswanathan N, Magnuson JS, Fowler CA. Compensation for coarticulation: disentangling auditory and gestural theories of perception of coarticulatory effects in speech. J Exp Psychol Hum Percept Perform 2010; 36:1005-15. [PMID: 20695714 DOI: 10.1037/a0018391] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different explanations for the phenomenon of compensation for coarticulation (CfC). An example of CfC is that if a speaker produces a gesture with a front place of articulation, it may be pulled slightly backwards if it follows a back place of articulation, and listeners' category boundaries shift (compensate) accordingly. The gestural account appeals to direct attunement to coarticulation to explain CfC, whereas the auditory account explains it by spectral contrast. In previous studies, spectral contrast and gestural consequences of coarticulation have been correlated, such that both accounts made identical predictions. We identify a liquid context in Tamil that disentangles contrast and coarticulation, such that the two accounts make different predictions. In a standard CfC task in Experiment 1, gestural coarticulation rather than spectral contrast determined the direction of CfC. Experiments 2, 3, and 4 demonstrated that tone analogues of the speech precursors failed to produce the same effects observed in Experiment 1, suggesting that simple spectral contrast cannot account for the findings of Experiment 1.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Psychology, University of Connecticut and Haskins Laboratories, New Haven, Connecticut, USA.
| | | | | |
Collapse
|
19
|
Abstract
Language use has a public face that is as important to study as the private faces under intensive psycholinguistic study. In the domain of phonology, public use of speech must meet an interpersonal "parity" constraint if it is to serve to communicate. That is, spoken language forms must reliably be identified by listeners. To that end, language forms are embodied, at the lowest level of description, as phonetic gestures of the vocal tract that lawfully structure informational media such as air and light. Over time, under the parity constraint, sound inventories emerge over communicative exchanges that have the property of sufficient identifiability.Communicative activities involve more than vocal tract actions. Talkers gesture and use facial expressions and eye gaze to communicate. Listeners embody their language understandings, exhibiting dispositions to behave in ways related to language understanding. Moreover, linguistic interchanges are embedded in the larger context of language use. Talkers recruit the environment in their communicative activities, for example, in using deictic points. Moreover, in using language as a "coordination device," interlocutors mutually entrain.
Collapse
Affiliation(s)
- Carol A Fowler
- Department of Psychology, University of Connecticut, Haskins Laboratories
| |
Collapse
|
20
|
Abstract
Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. Copyright © 2010 John Wiley & Sons, Ltd. For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Elizabeth D Casserly
- Department of Linguistics, Speech Research Laboratory, Indiana University, Bloomington, IN 47405, USA
| | - David B Pisoni
- Department of Psychological and Brain Sciences, Speech Research Laboratory, Cognitive Science Program, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
21
|
Abstract
Galantucci, Fowler, and Turvey (2006) have claimed that perceiving speech is perceiving gestures and that the motor system is recruited for perceiving speech. We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruitedfor perceiving speech, and that speech perception can be adequately described by a prototypical pattern recognition model, the fuzzy logical model of perception (FLMP). Empirical evidence taken as support for gesture and motor theory is reconsidered in more detail and in the framework of the FLMR Additional theoretical and logical arguments are made to challenge gesture and motor theory.
Collapse
|
22
|
Abstract
Four experiments examined the nature of multisensory speech information. In Experiment 1, participants were asked to match heard voices with dynamic visual-alone video clips of speakers' articulating faces. This cross-modal matching task was used to examine whether vocal source matching can be accomplished across sensory modalities. The results showed that observers could match speaking faces and voices, indicating that information about the speaker was available for cross-modal comparisons. In a series of follow-up experiments, several stimulus manipulations were used to determine some of the critical acoustic and optic patterns necessary for specifying cross-modal source information. The results showed that cross-modal source information was not available in static visual displays of faces and was not contingent on a prominent acoustic cue to vocal identity (f0). Furthermore, cross-modal matching was not possible when the acoustic signal was temporally reversed.
Collapse
Affiliation(s)
- Lorin Lachs
- Department of Psychology California State University, Fresno
| | | |
Collapse
|