1
|
Sekine K, Özyürek A. Children benefit from gestures to understand degraded speech but to a lesser extent than adults. Front Psychol 2024; 14:1305562. [PMID: 38303780 PMCID: PMC10832995 DOI: 10.3389/fpsyg.2023.1305562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 12/13/2023] [Indexed: 02/03/2024] Open
Abstract
The present study investigated to what extent children, compared to adults, benefit from gestures to disambiguate degraded speech by manipulating speech signals and manual modality. Dutch-speaking adults (N = 20) and 6- and 7-year-old children (N = 15) were presented with a series of video clips in which an actor produced a Dutch action verb with or without an accompanying iconic gesture. Participants were then asked to repeat what they had heard. The speech signal was either clear or altered into 4- or 8-band noise-vocoded speech. Children had more difficulty than adults in disambiguating degraded speech in the speech-only condition. However, when presented with both speech and gestures, children reached a comparable level of accuracy to that of adults in the degraded-speech-only condition. Furthermore, for adults, the enhancement of gestures was greater in the 4-band condition than in the 8-band condition, whereas children showed the opposite pattern. Gestures help children to disambiguate degraded speech, but children need more phonological information than adults to benefit from use of gestures. Children's multimodal language integration needs to further develop to adapt flexibly to challenging situations such as degraded speech, as tested in our study, or instances where speech is heard with environmental noise or through a face mask.
Collapse
Affiliation(s)
- Kazuki Sekine
- Faculty of Human Sciences, Waseda University, Tokorozawa, Japan
| | - Aslı Özyürek
- Centre for Language Studies, Radboud University, Nijmegen, Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
2
|
Schwarz J, Li KK, Sim JH, Zhang Y, Buchanan-Worster E, Post B, Gibson JL, McDougall K. Semantic Cues Modulate Children’s and Adults’ Processing of Audio-Visual Face Mask Speech. Front Psychol 2022; 13:879156. [PMID: 35928422 PMCID: PMC9343587 DOI: 10.3389/fpsyg.2022.879156] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/25/2022] [Indexed: 12/03/2022] Open
Abstract
During the COVID-19 pandemic, questions have been raised about the impact of face masks on communication in classroom settings. However, it is unclear to what extent visual obstruction of the speaker’s mouth or changes to the acoustic signal lead to speech processing difficulties, and whether these effects can be mitigated by semantic predictability, i.e., the availability of contextual information. The present study investigated the acoustic and visual effects of face masks on speech intelligibility and processing speed under varying semantic predictability. Twenty-six children (aged 8-12) and twenty-six adults performed an internet-based cued shadowing task, in which they had to repeat aloud the last word of sentences presented in audio-visual format. The results showed that children and adults made more mistakes and responded more slowly when listening to face mask speech compared to speech produced without a face mask. Adults were only significantly affected by face mask speech when both the acoustic and the visual signal were degraded. While acoustic mask effects were similar for children, removal of visual speech cues through the face mask affected children to a lesser degree. However, high semantic predictability reduced audio-visual mask effects, leading to full compensation of the acoustically degraded mask speech in the adult group. Even though children did not fully compensate for face mask speech with high semantic predictability, overall, they still profited from semantic cues in all conditions. Therefore, in classroom settings, strategies that increase contextual information such as building on students’ prior knowledge, using keywords, and providing visual aids, are likely to help overcome any adverse face mask effects.
Collapse
Affiliation(s)
- Julia Schwarz
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Julia Schwarz,
| | - Katrina Kechun Li
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- Katrina Kechun Li,
| | - Jasper Hong Sim
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Yixin Zhang
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth Buchanan-Worster
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Brechtje Post
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | | | - Kirsty McDougall
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
3
|
Jessica Tan SH, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: The Relationship between Cortical Tracking of Continuous Auditory-Visual Speech and Gaze Behaviour in Infants, Children and Adults. Neuroimage 2022; 256:119217. [PMID: 35436614 DOI: 10.1016/j.neuroimage.2022.119217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants - as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants' relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults' attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Collapse
Affiliation(s)
- S H Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University.
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language; IKERBASQUE, Basque Foundation for Science
| | | | - Michael J Crosse
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University
| |
Collapse
|
4
|
Lalonde K, McCreery RW. Audiovisual Enhancement of Speech Perception in Noise by School-Age Children Who Are Hard of Hearing. Ear Hear 2021; 41:705-719. [PMID: 32032226 PMCID: PMC7822589 DOI: 10.1097/aud.0000000000000830] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The purpose of this study was to examine age- and hearing-related differences in school-age children's benefit from visual speech cues. The study addressed three questions: (1) Do age and hearing loss affect degree of audiovisual (AV) speech enhancement in school-age children? (2) Are there age- and hearing-related differences in the mechanisms underlying AV speech enhancement in school-age children? (3) What cognitive and linguistic variables predict individual differences in AV benefit among school-age children? DESIGN Forty-eight children between 6 and 13 years of age (19 with mild to severe sensorineural hearing loss; 29 with normal hearing) and 14 adults with normal hearing completed measures of auditory and AV syllable detection and/or sentence recognition in a two-talker masker type and a spectrally matched noise. Children also completed standardized behavioral measures of receptive vocabulary, visuospatial working memory, and executive attention. Mixed linear modeling was used to examine effects of modality, listener group, and masker on sentence recognition accuracy and syllable detection thresholds. Pearson correlations were used to examine the relationship between individual differences in children's AV enhancement (AV-auditory-only) and age, vocabulary, working memory, executive attention, and degree of hearing loss. RESULTS Significant AV enhancement was observed across all tasks, masker types, and listener groups. AV enhancement of sentence recognition was similar across maskers, but children with normal hearing exhibited less AV enhancement of sentence recognition than adults with normal hearing and children with hearing loss. AV enhancement of syllable detection was greater in the two-talker masker than the noise masker, but did not vary significantly across listener groups. Degree of hearing loss positively correlated with individual differences in AV benefit on the sentence recognition task in noise, but not on the detection task. None of the cognitive and linguistic variables correlated with individual differences in AV enhancement of syllable detection or sentence recognition. CONCLUSIONS Although AV benefit to syllable detection results from the use of visual speech to increase temporal expectancy, AV benefit to sentence recognition requires that an observer extracts phonetic information from the visual speech signal. The findings from this study suggest that all listener groups were equally good at using temporal cues in visual speech to detect auditory speech, but that adults with normal hearing and children with hearing loss were better than children with normal hearing at extracting phonetic information from the visual signal and/or using visual speech information to access phonetic/lexical representations in long-term memory. These results suggest that standard, auditory-only clinical speech recognition measures likely underestimate real-world speech recognition skills of children with mild to severe hearing loss.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE, USA
| | - Ryan W. McCreery
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE, USA
| |
Collapse
|
5
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
6
|
The McGurk effect in the time of pandemic: Age-dependent adaptation to an environmental loss of visual speech cues. Psychon Bull Rev 2021; 28:992-1002. [PMID: 33443708 DOI: 10.3758/s13423-020-01852-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/19/2020] [Indexed: 11/08/2022]
Abstract
Seeing a person's mouth move for [ga] while hearing [ba] often results in the perception of "da." Such audiovisual integration of speech cues, known as the McGurk effect, is stable within but variable across individuals. When the visual or auditory cues are degraded, due to signal distortion or the perceiver's sensory impairment, reliance on cues via the impoverished modality decreases. This study tested whether cue-reliance adjustments due to exposure to reduced cue availability are persistent and transfer to subsequent perception of speech with all cues fully available. A McGurk experiment was administered at the beginning and after a month of mandatory face-mask wearing (enforced in Czechia during the 2020 pandemic). Responses to audio-visually incongruent stimuli were analyzed from 292 persons (ages 16-55), representing a cross-sectional sample, and 41 students (ages 19-27), representing a longitudinal sample. The extent to which the participants relied exclusively on visual cues was affected by testing time in interaction with age. After a month of reduced access to lipreading, reliance on visual cues (present at test) somewhat lowered for younger and increased for older persons. This implies that adults adapt their speech perception faculty to an altered environmental availability of multimodal cues, and that younger adults do so more efficiently. This finding demonstrates that besides sensory impairment or signal noise, which reduce cue availability and thus affect audio-visual cue reliance, having experienced a change in environmental conditions can modulate the perceiver's (otherwise relatively stable) general bias towards different modalities during speech communication.
Collapse
|
7
|
Ritter C, Vongpaisal T. Multimodal and Spectral Degradation Effects on Speech and Emotion Recognition in Adult Listeners. Trends Hear 2019; 22:2331216518804966. [PMID: 30378469 PMCID: PMC6236866 DOI: 10.1177/2331216518804966] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
For cochlear implant (CI) users, degraded spectral input hampers the
understanding of prosodic vocal emotion, especially in difficult listening
conditions. Using a vocoder simulation of CI hearing, we examined the extent to
which informative multimodal cues in a talker’s spoken expressions improve
normal hearing (NH) adults’ speech and emotion perception under different levels
of spectral degradation (two, three, four, and eight spectral bands).
Participants repeated the words verbatim and identified emotions (among four
alternative options: happy, sad, angry, and neutral) in meaningful sentences
that are semantically congruent with the expression of the intended emotion.
Sentences were presented in their natural speech form and in speech sampled
through a noise-band vocoder in sound (auditory-only) and video
(auditory–visual) recordings of a female talker. Visual information had a more
pronounced benefit in enhancing speech recognition in the lower spectral band
conditions. Spectral degradation, however, did not interfere with emotion
recognition performance when dynamic visual cues in a talker’s expression are
provided as participants scored at ceiling levels across all spectral band
conditions. Our use of familiar sentences that contained congruent semantic and
prosodic information have high ecological validity, which likely optimized
listener performance under simulated CI hearing and may better predict CI users’
outcomes in everyday listening contexts.
Collapse
Affiliation(s)
- Chantel Ritter
- 1 Department of Psychology, MacEwan University, Alberta, Canada
| | - Tara Vongpaisal
- 1 Department of Psychology, MacEwan University, Alberta, Canada
| |
Collapse
|
8
|
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss. Ear Hear 2017; 39:783-794. [PMID: 29252979 DOI: 10.1097/aud.0000000000000534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. DESIGN Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. RESULTS Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. CONCLUSIONS The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Collapse
|
9
|
Havy M, Zesiger P. Learning Spoken Words via the Ears and Eyes: Evidence from 30-Month-Old Children. Front Psychol 2017; 8:2122. [PMID: 29276493 PMCID: PMC5727082 DOI: 10.3389/fpsyg.2017.02122] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 11/21/2017] [Indexed: 12/02/2022] Open
Abstract
From the very first moments of their lives, infants are able to link specific movements of the visual articulators to auditory speech signals. However, recent evidence indicates that infants focus primarily on auditory speech signals when learning new words. Here, we ask whether 30-month-old children are able to learn new words based solely on visible speech information, and whether information from both auditory and visual modalities is available after learning in only one modality. To test this, children were taught new lexical mappings. One group of children experienced the words in the auditory modality (i.e., acoustic form of the word with no accompanying face). Another group experienced the words in the visual modality (seeing a silent talking face). Lexical recognition was tested in either the learning modality or in the other modality. Results revealed successful word learning in either modality. Results further showed cross-modal recognition following an auditory-only, but not a visual-only, experience of the words. Together, these findings suggest that visible speech becomes increasingly informative for the purpose of lexical learning, but that an auditory-only experience evokes a cross-modal representation of the words.
Collapse
Affiliation(s)
- Mélanie Havy
- Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland
| | | |
Collapse
|
10
|
Modeling the Development of Audiovisual Cue Integration in Speech Perception. Brain Sci 2017; 7:brainsci7030032. [PMID: 28335558 PMCID: PMC5366831 DOI: 10.3390/brainsci7030032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 03/03/2017] [Accepted: 03/16/2017] [Indexed: 11/22/2022] Open
Abstract
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Collapse
|
11
|
Havy M, Foroud A, Fais L, Werker JF. The Role of Auditory and Visual Speech in Word Learning at 18 Months and in Adulthood. Child Dev 2017; 88:2043-2059. [PMID: 28124795 DOI: 10.1111/cdev.12715] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Visual information influences speech perception in both infants and adults. It is still unknown whether lexical representations are multisensory. To address this question, we exposed 18-month-old infants (n = 32) and adults (n = 32) to new word-object pairings: Participants either heard the acoustic form of the words or saw the talking face in silence. They were then tested on recognition in the same or the other modality. Both 18-month-old infants and adults learned the lexical mappings when the words were presented auditorily and recognized the mapping at test when the word was presented in either modality, but only adults learned new words in a visual-only presentation. These results suggest developmental changes in the sensory format of lexical representations.
Collapse
Affiliation(s)
- Mélanie Havy
- University of British Columbia.,Université de Genève
| | | | | | | |
Collapse
|
12
|
Lalonde K, Holt RF. Audiovisual speech perception development at varying levels of perceptual processing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1713. [PMID: 27106318 PMCID: PMC4826374 DOI: 10.1121/1.4945590] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 01/04/2016] [Accepted: 03/25/2016] [Indexed: 06/05/2023]
Abstract
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| | - Rachael Frush Holt
- Department of Speech and Hearing Science, Ohio State University, 110 Pressey Hall, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
13
|
Shaw K, Baart M, Depowski N, Bortfeld H. Infants' preference for native audiovisual speech dissociated from congruency preference. PLoS One 2015; 10:e0126059. [PMID: 25927529 PMCID: PMC4415951 DOI: 10.1371/journal.pone.0126059] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 03/28/2015] [Indexed: 11/21/2022] Open
Abstract
Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.
Collapse
Affiliation(s)
- Kathleen Shaw
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
| | - Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language, Donostia - San Sebastián, Spain
| | - Nicole Depowski
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
| | - Heather Bortfeld
- Department of Psychology, University of Connecticut, Storrs, CT, United States of America
- Haskins Laboratories, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
14
|
Downing HC, Barutchu A, Crewther SG. Developmental trends in the facilitation of multisensory objects with distractors. Front Psychol 2015; 5:1559. [PMID: 25653630 PMCID: PMC4298743 DOI: 10.3389/fpsyg.2014.01559] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 12/15/2014] [Indexed: 11/21/2022] Open
Abstract
Sensory integration and the ability to discriminate target objects from distractors are critical to survival, yet the developmental trajectories of these abilities are unknown. This study investigated developmental changes in 9- (n = 18) and 11-year-old (n = 20) children, adolescents (n = 19) and adults (n = 22) using an audiovisual object discrimination task with uni- and multisensory distractors. Reaction times (RTs) were slower with visual/audiovisual distractors, and although all groups demonstrated facilitation of multisensory RTs in these conditions, children's and adolescents' responses corresponded to fewer race model violations than adults', suggesting protracted maturation of multisensory processes. Multisensory facilitation could not be explained by changes in RT variability, suggesting that tests of race model violations may still have theoretical value at least for familiar multisensory stimuli.
Collapse
Affiliation(s)
- Harriet C Downing
- School of Psychological Science, La Trobe University Melbourne, VIC, Australia
| | - Ayla Barutchu
- Department of Experimental Psychology, University of Oxford Oxford, UK
| | - Sheila G Crewther
- School of Psychological Science, La Trobe University Melbourne, VIC, Australia
| |
Collapse
|