51
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
52
|
Gómez Varela I, Orpella J, Poeppel D, Ripolles P, Assaneo MF. Syllabic rhythm and prior linguistic knowledge interact with individual differences to modulate phonological statistical learning. Cognition 2024; 245:105737. [PMID: 38342068 DOI: 10.1016/j.cognition.2024.105737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 01/30/2024] [Accepted: 01/31/2024] [Indexed: 02/13/2024]
Abstract
Phonological statistical learning - our ability to extract meaningful regularities from spoken language - is considered critical in the early stages of language acquisition, in particular for helping to identify discrete words in continuous speech. Most phonological statistical learning studies use an experimental task introduced by Saffran et al. (1996), in which the syllables forming the words to be learned are presented continuously and isochronously. This raises the question of the extent to which this purportedly powerful learning mechanism is robust to the kinds of rhythmic variability that characterize natural speech. Here, we tested participants with arhythmic, semi-rhythmic, and isochronous speech during learning. In addition, we investigated how input rhythmicity interacts with two other factors previously shown to modulate learning: prior knowledge (syllable order plausibility with respect to participants' first language) and learners' speech auditory-motor synchronization ability. We show that words are extracted by all learners even when the speech input is completely arhythmic. Interestingly, high auditory-motor synchronization ability increases statistical learning when the speech input is temporally more predictable but only when prior knowledge can also be used. This suggests an additional mechanism for learning based on predictions not only about when but also about what upcoming speech will be.
Collapse
Affiliation(s)
- Ireri Gómez Varela
- Institute of Neurobiology, National Autonomous University of Mexico, Querétaro, Mexico
| | - Joan Orpella
- Department of Psychology, New York University, New York, NY, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA; Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany; Center for Language, Music and Emotion (CLaME), New York University, New York, NY, USA; Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| | - Pablo Ripolles
- Department of Psychology, New York University, New York, NY, USA; Center for Language, Music and Emotion (CLaME), New York University, New York, NY, USA; Music and Audio Research Lab (MARL), New York University, New York, NY, USA; Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| | - M Florencia Assaneo
- Institute of Neurobiology, National Autonomous University of Mexico, Querétaro, Mexico.
| |
Collapse
|
53
|
Fletcher MD, Perry SW, Thoidis I, Verschuur CA, Goehring T. Improved tactile speech robustness to background noise with a dual-path recurrent neural network noise-reduction method. Sci Rep 2024; 14:7357. [PMID: 38548750 PMCID: PMC10978864 DOI: 10.1038/s41598-024-57312-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/17/2024] [Indexed: 04/01/2024] Open
Abstract
Many people with hearing loss struggle to understand speech in noisy environments, making noise robustness critical for hearing-assistive devices. Recently developed haptic hearing aids, which convert audio to vibration, can improve speech-in-noise performance for cochlear implant (CI) users and assist those unable to access hearing-assistive devices. They are typically body-worn rather than head-mounted, allowing additional space for batteries and microprocessors, and so can deploy more sophisticated noise-reduction techniques. The current study assessed whether a real-time-feasible dual-path recurrent neural network (DPRNN) can improve tactile speech-in-noise performance. Audio was converted to vibration on the wrist using a vocoder method, either with or without noise reduction. Performance was tested for speech in a multi-talker noise (recorded at a party) with a 2.5-dB signal-to-noise ratio. An objective assessment showed the DPRNN improved the scale-invariant signal-to-distortion ratio by 8.6 dB and substantially outperformed traditional noise-reduction (log-MMSE). A behavioural assessment in 16 participants showed the DPRNN improved tactile-only sentence identification in noise by 8.2%. This suggests that advanced techniques like the DPRNN could substantially improve outcomes with haptic hearing aids. Low-cost haptic devices could soon be an important supplement to hearing-assistive devices such as CIs or offer an alternative for people who cannot access CI technology.
Collapse
Affiliation(s)
- Mark D Fletcher
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Samuel W Perry
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Iordanis Thoidis
- School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Carl A Verschuur
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Tobias Goehring
- MRC Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
54
|
Haiduk F, Zatorre RJ, Benjamin L, Morillon B, Albouy P. Spectrotemporal cues and attention jointly modulate fMRI network topology for sentence and melody perception. Sci Rep 2024; 14:5501. [PMID: 38448636 PMCID: PMC10917817 DOI: 10.1038/s41598-024-56139-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/01/2024] [Indexed: 03/08/2024] Open
Abstract
Speech and music are two fundamental modes of human communication. Lateralisation of key processes underlying their perception has been related both to the distinct sensitivity to low-level spectrotemporal acoustic features and to top-down attention. However, the interplay between bottom-up and top-down processes needs to be clarified. In the present study, we investigated the contribution of acoustics and attention to melodies or sentences to lateralisation in fMRI functional network topology. We used sung speech stimuli selectively filtered in temporal or spectral modulation domains with crossed and balanced verbal and melodic content. Perception of speech decreased with degradation of temporal information, whereas perception of melodies decreased with spectral degradation. Applying graph theoretical metrics on fMRI connectivity matrices, we found that local clustering, reflecting functional specialisation, linearly increased when spectral or temporal cues crucial for the task goal were incrementally degraded. These effects occurred in a bilateral fronto-temporo-parietal network for processing temporally degraded sentences and in right auditory regions for processing spectrally degraded melodies. In contrast, global topology remained stable across conditions. These findings suggest that lateralisation for speech and music partially depends on an interplay of acoustic cues and task goals under increased attentional demands.
Collapse
Affiliation(s)
- Felix Haiduk
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria.
- Department of General Psychology, University of Padua, Padua, Italy.
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS) - CRBLM, Montreal, QC, Canada
| | - Lucas Benjamin
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- Cognitive Neuroimaging Unit, CNRS ERL 9003, INSERM U992, CEA, Université Paris-Saclay, NeuroSpin Center, 91191, Gif/Yvette, France
| | - Benjamin Morillon
- Aix Marseille University, Inserm, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Philippe Albouy
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS) - CRBLM, Montreal, QC, Canada
- CERVO Brain Research Centre, School of Psychology, Laval University, Quebec, QC, Canada
| |
Collapse
|
55
|
Boeve S, Möttönen R, Smalle EHM. Specificity of Motor Contributions to Auditory Statistical Learning. J Cogn 2024; 7:25. [PMID: 38370867 PMCID: PMC10870951 DOI: 10.5334/joc.351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/31/2024] [Indexed: 02/20/2024] Open
Abstract
Statistical learning is the ability to extract patterned information from continuous sensory signals. Recent evidence suggests that auditory-motor mechanisms play an important role in auditory statistical learning from speech signals. The question remains whether auditory-motor mechanisms support such learning generally or in a domain-specific manner. In Experiment 1, we tested the specificity of motor processes contributing to learning patterns from speech sequences. Participants either whispered or clapped their hands while listening to structured speech. In Experiment 2, we focused on auditory specificity, testing whether whispering equally affects learning patterns from speech and non-speech sequences. Finally, in Experiment 3, we examined whether learning patterns from speech and non-speech sequences are correlated. Whispering had a stronger effect than clapping on learning patterns from speech sequences in Experiment 1. Moreover, whispering impaired statistical learning more strongly from speech than non-speech sequences in Experiment 2. Interestingly, while participants in the non-speech tasks spontaneously synchronized their motor movements with the auditory stream more than participants in the speech tasks, the effect of the motor movements on learning was stronger in the speech domain. Finally, no correlation between speech and non-speech learning was observed. Overall, our findings support the idea that learning statistical patterns from speech versus non-speech relies on segregated mechanisms, and that the speech motor system contributes to auditory statistical learning in a highly specific manner.
Collapse
Affiliation(s)
- Sam Boeve
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Riikka Möttönen
- Cognitive Science, Department of Digital Humanities, University of Helsinki, Helsinki, Finland
| | - Eleonore H. M. Smalle
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
- Department of Developmental Psychology, Tilburg University, Tilburg, Netherlands
| |
Collapse
|
56
|
Petley L, Blankenship C, Hunter LL, Stewart HJ, Lin L, Moore DR. Amplitude Modulation Perception and Cortical Evoked Potentials in Children With Listening Difficulties and Their Typically Developing Peers. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:633-656. [PMID: 38241680 PMCID: PMC11000788 DOI: 10.1044/2023_jslhr-23-00317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 09/01/2023] [Accepted: 11/09/2023] [Indexed: 01/21/2024]
Abstract
PURPOSE Amplitude modulations (AMs) are important for speech intelligibility, and deficits in speech intelligibility are a leading source of impairment in childhood listening difficulties (LiD). The present study aimed to explore the relationships between AM perception and speech-in-noise (SiN) comprehension in children and to determine whether deficits in AM processing contribute to childhood LiD. Evoked responses were used to parse the neural origins of AM processing. METHOD Forty-one children with LiD and 44 typically developing children, ages 8-16 years, participated in the study. Behavioral AM depth thresholds were measured at 4 and 40 Hz. SiN tasks included the Listening in Spatialized Noise-Sentences Test (LiSN-S) and a coordinate response measure (CRM)-based task. Evoked responses were obtained during an AM change detection task using alternations between 4 and 40 Hz, including the N1 of the acoustic change complex, auditory steady-state response (ASSR), P300, and a late positive response (late potential [LP]). Maturational effects were explored via age correlations. RESULTS Age correlated with 4-Hz AM thresholds, CRM separated talker scores, and N1 amplitude. Age-normed LiSN-S scores obtained without spatial or talker cues correlated with age-corrected 4-Hz AM thresholds and area under the LP curve. CRM separated talker scores correlated with AM thresholds and area under the LP curve. Most behavioral measures of AM perception correlated with the signal-to-noise ratio and phase coherence of the 40-Hz ASSR. AM change response time also correlated with area under the LP curve. Children with LiD exhibited deficits with respect to 4-Hz thresholds, AM change accuracy, and area under the LP curve. CONCLUSIONS The observed relationships between AM perception and SiN performance extend the evidence that modulation perception is important for understanding SiN in childhood. In line with this finding, children with LiD demonstrated poorer performance on some measures of AM perception, but their evoked responses implicated a primarily cognitive deficit. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25009103.
Collapse
Affiliation(s)
- Lauren Petley
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- Patient Services Research, Cincinnati Children's Hospital Medical Center, OH
- Department of Psychology, Clarkson University, Potsdam, NY
| | - Chelsea Blankenship
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- Patient Services Research, Cincinnati Children's Hospital Medical Center, OH
| | - Lisa L Hunter
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- Patient Services Research, Cincinnati Children's Hospital Medical Center, OH
- Department of Otolaryngology, College of Medicine, University of Cincinnati, OH
- Department of Communication Sciences and Disorders, College of Allied Health Sciences, University of Cincinnati, OH
| | | | - Li Lin
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- Patient Services Research, Cincinnati Children's Hospital Medical Center, OH
| | - David R Moore
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- Patient Services Research, Cincinnati Children's Hospital Medical Center, OH
- Department of Otolaryngology, College of Medicine, University of Cincinnati, OH
- Manchester Centre for Audiology and Deafness, The University of Manchester, United Kingdom
| |
Collapse
|
57
|
Kanagokar V, Fathima H, Bhat JS, Muthu ANP. Effect of inter-aural temporal envelope differences on inter-aural time difference thresholds for amplitude modulated noise. Codas 2024; 36:e20220261. [PMID: 38324806 PMCID: PMC10903954 DOI: 10.1590/2317-1782/20232022261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/26/2023] [Indexed: 02/09/2024] Open
Abstract
PURPOSE The inter-aural time difference (ITD) and inter-aural level difference (ILD) are important acoustic cues for horizontal localization and spatial release from masking. These cues are encoded based on inter-aural comparisons of tonotopically matched binaural inputs. Therefore, binaural coherence or the interaural spectro-temporal similarity is a pre-requisite for encoding ITD and ILD. The modulation depth of envelope is an important envelope characteristic that helps in encoding the envelope-ITD. However, inter-aural difference in modulation depth can result in reduced binaural coherence and poor representation of binaural cues as in the case with reverberation, noise and compression in cochlear implants and hearing aids. This study investigated the effect of inter-aural modulation depth difference on the ITD thresholds for an amplitude-modulated noise in normal hearing young adults. METHODS An amplitude modulated high pass filtered noise with varying modulation depth differences was presented sequentially through headphones. In one ear, the modulation depth was retained at 90% and in the other ear it varied from 90% to 50%. The ITD thresholds for modulation frequencies of 8 Hz and 16 Hz were estimated as a function of the inter-aural modulation depth difference. RESULTS The Friedman test findings revealed a statistically significant increase in the ITD threshold with an increase in the inter-aural modulation depth difference for 8 Hz and 16 Hz. CONCLUSION The results indicate that the inter-aural differences in the modulation depth negatively impact ITD perception for an amplitude-modulated high pass filtered noise.
Collapse
Affiliation(s)
- Vibha Kanagokar
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India.
| | - Hasna Fathima
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India.
- Department of Audiology and Speech-Language Pathology, National Institute of Speech and Hearing - Trivandrum, Kerala, India.
| | | | - Arivudai Nambi Pitchai Muthu
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India.
- Department of Audiology, All India Institute of Speech and Hearing - Mysore, Karnataka, India.
| |
Collapse
|
58
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
59
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
60
|
Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Sci Rep 2024; 14:789. [PMID: 38191488 PMCID: PMC10774448 DOI: 10.1038/s41598-023-50438-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 12/20/2023] [Indexed: 01/10/2024] Open
Abstract
Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study's second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.
Collapse
Affiliation(s)
- Tong Shan
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Madeline S Cappelloni
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Ross K Maddox
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA.
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA.
- Center for Visual Science, University of Rochester, Rochester, NY, USA.
- Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
61
|
Barchet AV, Henry MJ, Pelofi C, Rimmele JM. Auditory-motor synchronization and perception suggest partially distinct time scales in speech and music. COMMUNICATIONS PSYCHOLOGY 2024; 2:2. [PMID: 39242963 PMCID: PMC11332030 DOI: 10.1038/s44271-023-00053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 12/19/2023] [Indexed: 09/09/2024]
Abstract
Speech and music might involve specific cognitive rhythmic timing mechanisms related to differences in the dominant rhythmic structure. We investigate the influence of different motor effectors on rate-specific processing in both domains. A perception and a synchronization task involving syllable and piano tone sequences and motor effectors typically associated with speech (whispering) and music (finger-tapping) were tested at slow (~2 Hz) and fast rates (~4.5 Hz). Although synchronization performance was generally better at slow rates, the motor effectors exhibited specific rate preferences. Finger-tapping was advantaged compared to whispering at slow but not at faster rates, with synchronization being effector-dependent at slow, but highly correlated at faster rates. Perception of speech and music was better at different rates and predicted by a fast general and a slow finger-tapping synchronization component. Our data suggests partially independent rhythmic timing mechanisms for speech and music, possibly related to a differential recruitment of cortical motor circuitry.
Collapse
Affiliation(s)
- Alice Vivien Barchet
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
| | - Molly J Henry
- Research Group 'Neural and Environmental Rhythms', Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Toronto Metropolitan University, Toronto, Canada
| | - Claire Pelofi
- Music and Audio Research Laboratory, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA
| | - Johanna M Rimmele
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA.
| |
Collapse
|
62
|
Assaneo MF, Orpella J. Rhythms in Speech. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:257-274. [PMID: 38918356 DOI: 10.1007/978-3-031-60183-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Speech can be defined as the human ability to communicate through a sequence of vocal sounds. Consequently, speech requires an emitter (the speaker) capable of generating the acoustic signal and a receiver (the listener) able to successfully decode the sounds produced by the emitter (i.e., the acoustic signal). Time plays a central role at both ends of this interaction. On the one hand, speech production requires precise and rapid coordination, typically within the order of milliseconds, of the upper vocal tract articulators (i.e., tongue, jaw, lips, and velum), their composite movements, and the activation of the vocal folds. On the other hand, the generated acoustic signal unfolds in time, carrying information at different timescales. This information must be parsed and integrated by the receiver for the correct transmission of meaning. This chapter describes the temporal patterns that characterize the speech signal and reviews research that explores the neural mechanisms underlying the generation of these patterns and the role they play in speech comprehension.
Collapse
Affiliation(s)
- M Florencia Assaneo
- Instituto de Neurobiología, Universidad Autónoma de México, Santiago de Querétaro, Mexico.
| | - Joan Orpella
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|
63
|
Coull JT, Korolczuk I, Morillon B. The Motor of Time: Coupling Action to Temporally Predictable Events Heightens Perception. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:199-213. [PMID: 38918353 DOI: 10.1007/978-3-031-60183-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Timing and motor function share neural circuits and dynamics, which underpin their close and synergistic relationship. For instance, the temporal predictability of a sensory event optimizes motor responses to that event. Knowing when an event is likely to occur lowers response thresholds, leading to faster and more efficient motor behavior though in situations of response conflict can induce impulsive and inappropriate responding. In turn, through a process of active sensing, coupling action to temporally predictable sensory input enhances perceptual processing. Action not only hones perception of the event's onset or duration, but also boosts sensory processing of its non-temporal features such as pitch or shape. The effects of temporal predictability on motor behavior and sensory processing involve motor and left parietal cortices and are mediated by changes in delta and beta oscillations in motor areas of the brain.
Collapse
Affiliation(s)
- Jennifer T Coull
- Centre for Research in Psychology and Neuroscience (UMR 7077), Aix-Marseille Université & CNRS, Marseille, France.
| | - Inga Korolczuk
- Department of Pathophysiology, Medical University of Lublin, Lublin, Poland
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| |
Collapse
|
64
|
Sjuls GS, Vulchanova MD, Assaneo MF. Replication of population-level differences in auditory-motor synchronization ability in a Norwegian-speaking population. COMMUNICATIONS PSYCHOLOGY 2023; 1:47. [PMID: 39242904 PMCID: PMC11332004 DOI: 10.1038/s44271-023-00049-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 12/05/2023] [Indexed: 09/09/2024]
Abstract
The Speech-to-Speech Synchronization test is a powerful tool in assessing individuals' auditory-motor synchronization ability, namely the ability to synchronize one's own utterances to the rhythm of an external speech signal. Recent studies using the test have revealed that participants fall into two distinct groups-high synchronizers and low synchronizers-with significant differences in their neural (structural and functional) underpinnings and outcomes on several behavioral tasks. Therefore, it is critical to assess the universality of the population-level distribution (indicating two groups rather than a normal distribution) across populations of speakers. Here we demonstrate that the previous results replicate with a Norwegian-speaking population, indicating that the test is generalizable beyond previously tested populations of native English- and German-speakers.
Collapse
Affiliation(s)
- Guro S Sjuls
- Language Acquisition and Language Processing Lab, Norwegian University of Science and Technology, Department of Language and Literature, Trondheim, Norway.
| | - Mila D Vulchanova
- Language Acquisition and Language Processing Lab, Norwegian University of Science and Technology, Department of Language and Literature, Trondheim, Norway
| | - M Florencia Assaneo
- Institute of Neurobiology, National Autonomous University of Mexico, Santiago de Querétaro, México
| |
Collapse
|
65
|
He D, Buder EH, Bidelman GM. Cross-linguistic and acoustic-driven effects on multiscale neural synchrony to stress rhythms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.04.570012. [PMID: 38106017 PMCID: PMC10723321 DOI: 10.1101/2023.12.04.570012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
We investigated how neural oscillations code the hierarchical nature of stress rhythms in speech and how stress processing varies with language experience. By measuring phase synchrony of multilevel EEG-acoustic tracking and intra-brain cross-frequency coupling, we show the encoding of stress involves different neural signatures (delta rhythms = stress foot rate; theta rhythms = syllable rate), is stronger for amplitude vs. duration stress cues, and induces nested delta-theta coherence mirroring the stress-syllable hierarchy in speech. Only native English, but not Mandarin, speakers exhibited enhanced neural entrainment at central stress (2 Hz) and syllable (4 Hz) rates intrinsic to natural English. English individuals with superior cortical-stress tracking capabilities also displayed stronger neural hierarchical coherence, highlighting a nuanced interplay between internal nesting of brain rhythms and external entrainment rooted in language-specific speech rhythms. Our cross-language findings reveal brain-speech synchronization is not purely a "bottom-up" but benefits from "top-down" processing from listeners' language-specific experience.
Collapse
Affiliation(s)
- Deling He
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Eugene H. Buder
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
| |
Collapse
|
66
|
Bowling DL. Biological principles for music and mental health. Transl Psychiatry 2023; 13:374. [PMID: 38049408 PMCID: PMC10695969 DOI: 10.1038/s41398-023-02671-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 10/30/2023] [Accepted: 11/17/2023] [Indexed: 12/06/2023] Open
Abstract
Efforts to integrate music into healthcare systems and wellness practices are accelerating but the biological foundations supporting these initiatives remain underappreciated. As a result, music-based interventions are often sidelined in medicine. Here, I bring together advances in music research from neuroscience, psychology, and psychiatry to bridge music's specific foundations in human biology with its specific therapeutic applications. The framework I propose organizes the neurophysiological effects of music around four core elements of human musicality: tonality, rhythm, reward, and sociality. For each, I review key concepts, biological bases, and evidence of clinical benefits. Within this framework, I outline a strategy to increase music's impact on health based on standardizing treatments and their alignment with individual differences in responsivity to these musical elements. I propose that an integrated biological understanding of human musicality-describing each element's functional origins, development, phylogeny, and neural bases-is critical to advancing rational applications of music in mental health and wellness.
Collapse
Affiliation(s)
- Daniel L Bowling
- Department of Psychiatry and Behavioral Sciences, Stanford University, School of Medicine, Stanford, CA, USA.
- Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, School of Humanities and Sciences, Stanford, CA, USA.
| |
Collapse
|
67
|
Cecchetti G, Tomasini CA, Herff SA, Rohrmeier MA. Interpreting Rhythm as Parsing: Syntactic-Processing Operations Predict the Migration of Visual Flashes as Perceived During Listening to Musical Rhythms. Cogn Sci 2023; 47:e13389. [PMID: 38038624 DOI: 10.1111/cogs.13389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/10/2023] [Accepted: 11/13/2023] [Indexed: 12/02/2023]
Abstract
Music can be interpreted by attributing syntactic relationships to sequential musical events, and, computationally, such musical interpretation represents an analogous combinatorial task to syntactic processing in language. While this perspective has been primarily addressed in the domain of harmony, we focus here on rhythm in the Western tonal idiom, and we propose for the first time a framework for modeling the moment-by-moment execution of processing operations involved in the interpretation of music. Our approach is based on (1) a music-theoretically motivated grammar formalizing the competence of rhythmic interpretation in terms of three basic types of dependency (preparation, syncopation, and split; Rohrmeier, 2020), and (2) psychologically plausible predictions about the complexity of structural integration and memory storage operations, necessary for parsing hierarchical dependencies, derived from the dependency locality theory (Gibson, 2000). With a behavioral experiment, we exemplify an empirical implementation of the proposed theoretical framework. One hundred listeners were asked to reproduce the location of a visual flash presented while listening to three rhythmic excerpts, each exemplifying a different interpretation under the formal grammar. The hypothesized execution of syntactic-processing operations was found to be a significant predictor of the observed displacement between the reported and the objective location of the flashes. Overall, this study presents a theoretical approach and a first empirical proof-of-concept for modeling the cognitive process resulting in such interpretation as a form of syntactic parsing with algorithmic similarities to its linguistic counterpart. Results from the present small-scale experiment should not be read as a final test of the theory, but they are consistent with the theoretical predictions after controlling for several possible confounding factors and may form the basis for further large-scale and ecological testing.
Collapse
Affiliation(s)
- Gabriele Cecchetti
- Digital and Cognitive Musicology Lab, École Polytechnique Fédérale de Lausanne
| | - Cédric A Tomasini
- Digital and Cognitive Musicology Lab, École Polytechnique Fédérale de Lausanne
| | - Steffen A Herff
- Digital and Cognitive Musicology Lab, École Polytechnique Fédérale de Lausanne
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University
| | - Martin A Rohrmeier
- Digital and Cognitive Musicology Lab, École Polytechnique Fédérale de Lausanne
| |
Collapse
|
68
|
Inbar M, Genzer S, Perry A, Grossman E, Landau AN. Intonation Units in Spontaneous Speech Evoke a Neural Response. J Neurosci 2023; 43:8189-8200. [PMID: 37793909 PMCID: PMC10697392 DOI: 10.1523/jneurosci.0235-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/16/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023] Open
Abstract
Spontaneous speech is produced in chunks called intonation units (IUs). IUs are defined by a set of prosodic cues and presumably occur in all human languages. Recent work has shown that across different grammatical and sociocultural conditions IUs form rhythms of ∼1 unit per second. Linguistic theory suggests that IUs pace the flow of information in the discourse. As a result, IUs provide a promising and hitherto unexplored theoretical framework for studying the neural mechanisms of communication. In this article, we identify a neural response unique to the boundary defined by the IU. We measured the EEG of human participants (of either sex), who listened to different speakers recounting an emotional life event. We analyzed the speech stimuli linguistically and modeled the EEG response at word offset using a GLM approach. We find that the EEG response to IU-final words differs from the response to IU-nonfinal words even when equating acoustic boundary strength. Finally, we relate our findings to the body of research on rhythmic brain mechanisms in speech processing. We study the unique contribution of IUs and acoustic boundary strength in predicting delta-band EEG. This analysis suggests that IU-related neural activity, which is tightly linked to the classic Closure Positive Shift (CPS), could be a time-locked component that captures the previously characterized delta-band neural speech tracking.SIGNIFICANCE STATEMENT Linguistic communication is central to human experience, and its neural underpinnings are a topic of much research in recent years. Neuroscientific research has benefited from studying human behavior in naturalistic settings, an endeavor that requires explicit models of complex behavior. Usage-based linguistic theory suggests that spoken language is prosodically structured in intonation units. We reveal that the neural system is attuned to intonation units by explicitly modeling their impact on the EEG response beyond mere acoustics. To our understanding, this is the first time this is demonstrated in spontaneous speech under naturalistic conditions and under a theoretical framework that connects the prosodic chunking of speech, on the one hand, with the flow of information during communication, on the other.
Collapse
Affiliation(s)
- Maya Inbar
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Shir Genzer
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Anat Perry
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Eitan Grossman
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Ayelet N Landau
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| |
Collapse
|
69
|
Anikin A, Canessa-Pollard V, Pisanski K, Massenet M, Reby D. Beyond speech: Exploring diversity in the human voice. iScience 2023; 26:108204. [PMID: 37908309 PMCID: PMC10613903 DOI: 10.1016/j.isci.2023.108204] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/20/2023] [Accepted: 10/11/2023] [Indexed: 11/02/2023] Open
Abstract
Humans have evolved voluntary control over vocal production for speaking and singing, while preserving the phylogenetically older system of spontaneous nonverbal vocalizations such as laughs and screams. To test for systematic acoustic differences between these vocal domains, we analyzed a broad, cross-cultural corpus representing over 2 h of speech, singing, and nonverbal vocalizations. We show that, while speech is relatively low-pitched and tonal with mostly regular phonation, singing and especially nonverbal vocalizations vary enormously in pitch and often display harsh-sounding, irregular phonation owing to nonlinear phenomena. The evolution of complex supralaryngeal articulatory spectro-temporal modulation has been critical for speech, yet has not significantly constrained laryngeal source modulation. In contrast, articulation is very limited in nonverbal vocalizations, which predominantly contain minimally articulated open vowels and rapid temporal modulation in the roughness range. We infer that vocal source modulation works best for conveying affect, while vocal filter modulation mainly facilitates semantic communication.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| | - Valentina Canessa-Pollard
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
- Psychology, Institute of Psychology, Business and Human Sciences, University of Chichester, Chichester, West Sussex PO19 6PE, UK
| | - Katarzyna Pisanski
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
- CNRS French National Centre for Scientific Research, DDL Dynamics of Language Lab, University of Lyon 2, 69007 Lyon, France
- Institute of Psychology, University of Wrocław, Dawida 1, 50-527 Wrocław, Poland
| | - Mathilde Massenet
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| | - David Reby
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| |
Collapse
|
70
|
Petley L, Blankenship C, Hunter LL, Stewart HJ, Lin L, Moore DR. Amplitude modulation perception and cortical evoked potentials in children with listening difficulties and their typically-developing peers. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.26.23297523. [PMID: 37961469 PMCID: PMC10635202 DOI: 10.1101/2023.10.26.23297523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Purpose Amplitude modulations (AM) are important for speech intelligibility, and deficits in speech intelligibility are a leading source of impairment in childhood listening difficulties (LiD). The present study aimed to explore the relationships between AM perception and speech-in-noise (SiN) comprehension in children and to determine whether deficits in AM processing contribute to childhood LiD. Evoked responses were used to parse the neural origin of AM processing. Method Forty-one children with LiD and forty-four typically-developing children, ages 8-16 y.o., participated in the study. Behavioral AM depth thresholds were measured at 4 and 40 Hz. SiN tasks included the LiSN-S and a Coordinate Response Measure (CRM)-based task. Evoked responses were obtained during an AM Change detection task using alternations between 4 and 40 Hz, including the N1 of the acoustic change complex, auditory steady-state response (ASSR), P300, and a late positive response (LP). Maturational effects were explored via age correlations. Results Age correlated with 4 Hz AM thresholds, CRM Separated Talker scores, and N1 amplitude. Age-normed LiSN-S scores obtained without spatial or talker cues correlated with age-corrected 4 Hz AM thresholds and area under the LP curve. CRM Separated Talker scores correlated with AM thresholds and area under the LP curve. Most behavioral measures of AM perception correlated with the SNR and phase coherence of the 40 Hz ASSR. AM Change RT also correlated with area under the LP curve. Children with LiD exhibited deficits with respect to 4 Hz thresholds, AM Change accuracy, and area under the LP curve. Conclusions The observed relationships between AM perception and SiN performance extend the evidence that modulation perception is important for understanding SiN in childhood. In line with this finding, children with LiD demonstrated poorer performance on some measures of AM perception, but their evoked responses implicated a primarily cognitive deficit.
Collapse
|
71
|
Persici V, Blain SD, Iversen JR, Key AP, Kotz SA, Devin McAuley J, Gordon RL. Individual differences in neural markers of beat processing relate to spoken grammar skills in six-year-old children. BRAIN AND LANGUAGE 2023; 246:105345. [PMID: 37994830 DOI: 10.1016/j.bandl.2023.105345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/24/2023]
Abstract
Based on the idea that neural entrainment establishes regular attentional fluctuations that facilitate hierarchical processing in both music and language, we hypothesized that individual differences in syntactic (grammatical) skills will be partly explained by patterns of neural responses to musical rhythm. To test this hypothesis, we recorded neural activity using electroencephalography (EEG) while children (N = 25) listened passively to rhythmic patterns that induced different beat percepts. Analysis of evoked beta and gamma activity revealed that individual differences in the magnitude of neural responses to rhythm explained variance in six-year-olds' expressive grammar abilities, beyond and complementarily to their performance in a behavioral rhythm perception task. These results reinforce the idea that mechanisms of neural beat entrainment may be a shared neural resource supporting hierarchical processing across music and language and suggest a relevant marker of the relationship between rhythm processing and grammar abilities in elementary-school-age children, previously observed only behaviorally.
Collapse
Affiliation(s)
- Valentina Persici
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA; Department of Psychology, University of Milano - Bicocca, Milan, Italy; Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Human Sciences, University of Verona, Verona, Italy.
| | - Scott D Blain
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - John R Iversen
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Ontario, Canada; Institute for Neural Computation, University of California San Diego, La Jolla, CA, USA
| | - Alexandra P Key
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA; Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sonja A Kotz
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, the Netherlands; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - J Devin McAuley
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| | - Reyna L Gordon
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA; Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Psychology, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
72
|
Schmidt F, Chen Y, Keitel A, Rösch S, Hannemann R, Serman M, Hauswald A, Weisz N. Neural speech tracking shifts from the syllabic to the modulation rate of speech as intelligibility decreases. Psychophysiology 2023; 60:e14362. [PMID: 37350379 PMCID: PMC10909526 DOI: 10.1111/psyp.14362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 04/24/2023] [Accepted: 05/10/2023] [Indexed: 06/24/2023]
Abstract
The most prominent acoustic features in speech are intensity modulations, represented by the amplitude envelope of speech. Synchronization of neural activity with these modulations supports speech comprehension. As the acoustic modulation of speech is related to the production of syllables, investigations of neural speech tracking commonly do not distinguish between lower-level acoustic (envelope modulation) and higher-level linguistic (syllable rate) information. Here we manipulated speech intelligibility using noise-vocoded speech and investigated the spectral dynamics of neural speech processing, across two studies at cortical and subcortical levels of the auditory hierarchy, using magnetoencephalography. Overall, cortical regions mostly track the syllable rate, whereas subcortical regions track the acoustic envelope. Furthermore, with less intelligible speech, tracking of the modulation rate becomes more dominant. Our study highlights the importance of distinguishing between envelope modulation and syllable rate and provides novel possibilities to better understand differences between auditory processing and speech/language processing disorders.
Collapse
Affiliation(s)
- Fabian Schmidt
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Ya‐Ping Chen
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | | | - Maja Serman
- Audiological Research UnitSivantos GmbHErlangenGermany
| | - Anne Hauswald
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Nathan Weisz
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical UniversitySalzburgAustria
| |
Collapse
|
73
|
Doelling KB, Arnal LH, Assaneo MF. Adaptive oscillators support Bayesian prediction in temporal processing. PLoS Comput Biol 2023; 19:e1011669. [PMID: 38011225 PMCID: PMC10703266 DOI: 10.1371/journal.pcbi.1011669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/07/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open
Abstract
Humans excel at predictively synchronizing their behavior with external rhythms, as in dance or music performance. The neural processes underlying rhythmic inferences are debated: whether predictive perception relies on high-level generative models or whether it can readily be implemented locally by hard-coded intrinsic oscillators synchronizing to rhythmic input remains unclear and different underlying computational mechanisms have been proposed. Here we explore human perception for tone sequences with some temporal regularity at varying rates, but with considerable variability. Next, using a dynamical systems perspective, we successfully model the participants behavior using an adaptive frequency oscillator which adjusts its spontaneous frequency based on the rate of stimuli. This model better reflects human behavior than a canonical nonlinear oscillator and a predictive ramping model-both widely used for temporal estimation and prediction-and demonstrate that the classical distinction between absolute and relative computational mechanisms can be unified under this framework. In addition, we show that neural oscillators may constitute hard-coded physiological priors-in a Bayesian sense-that reduce temporal uncertainty and facilitate the predictive processing of noisy rhythms. Together, the results show that adaptive oscillators provide an elegant and biologically plausible means to subserve rhythmic inference, reconciling previously incompatible frameworks for temporal inferential processes.
Collapse
Affiliation(s)
- Keith B. Doelling
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
- Center for Language Music and Emotion, New York University, New York, New York, United States of America
| | - Luc H. Arnal
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
| | - M. Florencia Assaneo
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Santiago de Querétaro, México
| |
Collapse
|
74
|
Daikoku T. Temporal dynamics of statistical learning in children's song contributes to phase entrainment and production of novel information in multiple cultures. Sci Rep 2023; 13:18041. [PMID: 37872404 PMCID: PMC10593840 DOI: 10.1038/s41598-023-45493-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 10/20/2023] [Indexed: 10/25/2023] Open
Abstract
Statistical learning is thought to be linked to brain development. For example, statistical learning of language and music starts at an early age and is shown to play a significant role in acquiring the delta-band rhythm that is essential for language and music learning. However, it remains unclear how auditory cultural differences affect the statistical learning process and the resulting probabilistic and acoustic knowledge acquired through it. This study examined how children's songs are acquired through statistical learning. This study used a Hierarchical Bayesian statistical learning (HBSL) model, mimicking the statistical learning processes of the brain. Using this model, I conducted a simulation experiment to visualize the temporal dynamics of perception and production processes through statistical learning among different cultures. The model learned from a corpus of children's songs in MIDI format, which consists of English, German, Spanish, Japanese, and Korean songs as the training data. In this study, I investigated how the probability distribution of the model is transformed over 15 trials of learning in each song. Furthermore, using the probability distribution of each model over 15 trials of learning each song, new songs were probabilistically generated. The results suggested that, in learning processes, chunking and hierarchical knowledge increased gradually through 15 rounds of statistical learning for each piece of children's songs. In production processes, statistical learning led to the gradual increase of delta-band rhythm (1-3 Hz). Furthermore, by combining the acquired chunks and hierarchy through statistical learning, statistically novel music was generated gradually in comparison to the original songs (i.e. the training songs). These findings were observed consistently, in multiple cultures. The present study indicated that the statistical learning capacity of the brain, in multiple cultures, contributes to the acquisition and generation of delta-band rhythm, which is critical for acquiring language and music. It is suggested that cultural differences may not significantly modulate the statistical learning effects since statistical learning and slower rhythm processing are both essential functions in the human brain across cultures. Furthermore, statistical learning of children's songs leads to the acquisition of hierarchical knowledge and the ability to generate novel music. This study may provide a novel perspective on the developmental origins of creativity and the importance of statistical learning through early development.
Collapse
Affiliation(s)
- Tatsuya Daikoku
- Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.
- Center for Brain, Mind and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan.
| |
Collapse
|
75
|
L'Hermite S, Zoefel B. Rhythmic Entrainment Echoes in Auditory Perception. J Neurosci 2023; 43:6667-6678. [PMID: 37604689 PMCID: PMC10538584 DOI: 10.1523/jneurosci.0051-23.2023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/10/2023] [Accepted: 03/20/2023] [Indexed: 08/23/2023] Open
Abstract
Rhythmic entrainment echoes-rhythmic brain responses that outlast rhythmic stimulation-can demonstrate endogenous neural oscillations entrained by the stimulus rhythm. Here, we tested for such echoes in auditory perception. Participants detected a pure tone target, presented at a variable delay after another pure tone that was rhythmically modulated in amplitude. In four experiments involving 154 human (female and male) participants, we tested (1) which stimulus rate produces the strongest entrainment echo and, inspired by the tonotopical organization of the auditory system and findings in nonhuman primates, (2) whether these are organized according to sound frequency. We found the strongest entrainment echoes after 6 and 8 Hz stimulation, respectively. The best moments for target detection (in phase or antiphase with the preceding rhythm) depended on whether sound frequencies of entraining and target stimuli matched, which is in line with a tonotopical organization. However, for the same experimental condition, best moments were not always consistent across experiments. We provide a speculative explanation for these differences that relies on the notion that neural entrainment and repetition-related adaptation might exercise competing opposite influences on perception. Together, we find rhythmic echoes in auditory perception that seem more complex than those predicted from initial theories of neural entrainment.SIGNIFICANCE STATEMENT Rhythmic entrainment echoes are rhythmic brain responses that are produced by a rhythmic stimulus and persist after its offset. These echoes play an important role for the identification of endogenous brain oscillations, entrained by rhythmic stimulation, and give us insights into whether and how participants predict the timing of events. In four independent experiments involving >150 participants, we examined entrainment echoes in auditory perception. We found that entrainment echoes have a preferred rate (between 6 and 8 Hz) and seem to follow the tonotopic organization of the auditory system. Although speculative, we also found evidence that several, potentially competing processes might interact to produce such echoes, a notion that might need to be considered for future experimental design.
Collapse
Affiliation(s)
| | - Benedikt Zoefel
- Université de Toulouse III-Paul Sabatier, 31062 Toulouse, France
- Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et Cognition, Centre Hospitalier Universitaire Purpan, 31052 Toulouse, France
| |
Collapse
|
76
|
Xu N, Qin X, Zhou Z, Shan W, Ren J, Yang C, Lu L, Wang Q. Age differentially modulates the cortical tracking of the lower and higher level linguistic structures during speech comprehension. Cereb Cortex 2023; 33:10463-10474. [PMID: 37566910 DOI: 10.1093/cercor/bhad296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 07/23/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Speech comprehension requires listeners to rapidly parse continuous speech into hierarchically-organized linguistic structures (i.e. syllable, word, phrase, and sentence) and entrain the neural activities to the rhythm of different linguistic levels. Aging is accompanied by changes in speech processing, but it remains unclear how aging affects different levels of linguistic representation. Here, we recorded magnetoencephalography signals in older and younger groups when subjects actively and passively listened to the continuous speech in which hierarchical linguistic structures of word, phrase, and sentence were tagged at 4, 2, and 1 Hz, respectively. A newly-developed parameterization algorithm was applied to separate the periodically linguistic tracking from the aperiodic component. We found enhanced lower-level (word-level) tracking, reduced higher-level (phrasal- and sentential-level) tracking, and reduced aperiodic offset in older compared with younger adults. Furthermore, we observed the attentional modulation on the sentential-level tracking being larger for younger than for older ones. Notably, the neuro-behavior analyses showed that subjects' behavioral accuracy was positively correlated with the higher-level linguistic tracking, reversely correlated with the lower-level linguistic tracking. Overall, these results suggest that the enhanced lower-level linguistic tracking, reduced higher-level linguistic tracking and less flexibility of attentional modulation may underpin aging-related decline in speech comprehension.
Collapse
Affiliation(s)
- Na Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Xiaoxiao Qin
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Ziqi Zhou
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Wei Shan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Jiechuan Ren
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Chunqing Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Lingxi Lu
- Center for the Cognitive Science of Language, Beijing Language and Culture University, Beijing 100083, China
| | - Qun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- National Clinical Research Center for Neurological Diseases, Beijing 100070, China
- Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, Beijing 100069, China
| |
Collapse
|
77
|
Quique YM, Gnanateja GN, Dickey MW, Evans WS, Chandrasekaran B. Examining cortical tracking of the speech envelope in post-stroke aphasia. Front Hum Neurosci 2023; 17:1122480. [PMID: 37780966 PMCID: PMC10538638 DOI: 10.3389/fnhum.2023.1122480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 08/28/2023] [Indexed: 10/03/2023] Open
Abstract
Introduction People with aphasia have been shown to benefit from rhythmic elements for language production during aphasia rehabilitation. However, it is unknown whether rhythmic processing is associated with such benefits. Cortical tracking of the speech envelope (CTenv) may provide a measure of encoding of speech rhythmic properties and serve as a predictor of candidacy for rhythm-based aphasia interventions. Methods Electroencephalography was used to capture electrophysiological responses while Spanish speakers with aphasia (n = 9) listened to a continuous speech narrative (audiobook). The Temporal Response Function was used to estimate CTenv in the delta (associated with word- and phrase-level properties), theta (syllable-level properties), and alpha bands (attention-related properties). CTenv estimates were used to predict aphasia severity, performance in rhythmic perception and production tasks, and treatment response in a sentence-level rhythm-based intervention. Results CTenv in delta and theta, but not alpha, predicted aphasia severity. Neither CTenv in delta, alpha, or theta bands predicted performance in rhythmic perception or production tasks. Some evidence supported that CTenv in theta could predict sentence-level learning in aphasia, but alpha and delta did not. Conclusion CTenv of the syllable-level properties was relatively preserved in individuals with less language impairment. In contrast, higher encoding of word- and phrase-level properties was relatively impaired and was predictive of more severe language impairments. CTenv and treatment response to sentence-level rhythm-based interventions need to be further investigated.
Collapse
Affiliation(s)
- Yina M. Quique
- Center for Education in Health Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - G. Nike Gnanateja
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States
| | - Michael Walsh Dickey
- VA Pittsburgh Healthcare System, Pittsburgh, PA, United States
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
- Roxelyn and Richard Pepper Department of Communication Science and Disorders, School of Communication. Northwestern University, Evanston, IL, United States
| |
Collapse
|
78
|
James LS, Wang AS, Bertolo M, Sakata JT. Learning to pause: Fidelity of and biases in the developmental acquisition of gaps in the communicative signals of a songbird. Dev Sci 2023; 26:e13382. [PMID: 36861437 DOI: 10.1111/desc.13382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 01/21/2023] [Accepted: 02/10/2023] [Indexed: 03/03/2023]
Abstract
The temporal organization of sounds used in social contexts can provide information about signal function and evoke varying responses in listeners (receivers). For example, music is a universal and learned human behavior that is characterized by different rhythms and tempos that can evoke disparate responses in listeners. Similarly, birdsong is a social behavior in songbirds that is learned during critical periods in development and used to evoke physiological and behavioral responses in receivers. Recent investigations have begun to reveal the breadth of universal patterns in birdsong and their similarities to common patterns in speech and music, but relatively little is known about the degree to which biological predispositions and developmental experiences interact to shape the temporal patterning of birdsong. Here, we investigated how biological predispositions modulate the acquisition and production of an important temporal feature of birdsong, namely the duration of silent pauses ("gaps") between vocal elements ("syllables"). Through analyses of semi-naturally raised and experimentally tutored zebra finches, we observed that juvenile zebra finches imitate the durations of the silent gaps in their tutor's song. Further, when juveniles were experimentally tutored with stimuli containing a wide range of gap durations, we observed biases in the prevalence and stereotypy of gap durations. Together, these studies demonstrate how biological predispositions and developmental experiences differently affect distinct temporal features of birdsong and highlight similarities in developmental plasticity across birdsong, speech, and music. RESEARCH HIGHLIGHTS: The temporal organization of learned acoustic patterns can be similar across human cultures and across species, suggesting biological predispositions in acquisition. We studied how biological predispositions and developmental experiences affect an important temporal feature of birdsong, namely the duration of silent intervals between vocal elements ("gaps"). Semi-naturally and experimentally tutored zebra finches imitated the durations of gaps in their tutor's song and displayed some biases in the learning and production of gap durations and in gap variability. These findings in the zebra finch provide parallels with the acquisition of temporal features of speech and music in humans.
Collapse
Affiliation(s)
- Logan S James
- Department of Biology, McGill University, Montréal, Quebec, Canada
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Angela S Wang
- Department of Biology, McGill University, Montréal, Quebec, Canada
| | - Mila Bertolo
- Centre for Research in Brain, Language and Music, McGill University, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
| | - Jon T Sakata
- Department of Biology, McGill University, Montréal, Quebec, Canada
- Centre for Research in Brain, Language and Music, McGill University, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
| |
Collapse
|
79
|
Kosakowski HL, Norman-Haignere S, Mynick A, Takahashi A, Saxe R, Kanwisher N. Preliminary evidence for selective cortical responses to music in one-month-old infants. Dev Sci 2023; 26:e13387. [PMID: 36951215 DOI: 10.1111/desc.13387] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 02/17/2023] [Accepted: 02/21/2023] [Indexed: 03/24/2023]
Abstract
Prior studies have observed selective neural responses in the adult human auditory cortex to music and speech that cannot be explained by the differing lower-level acoustic properties of these stimuli. Does infant cortex exhibit similarly selective responses to music and speech shortly after birth? To answer this question, we attempted to collect functional magnetic resonance imaging (fMRI) data from 45 sleeping infants (2.0- to 11.9-weeks-old) while they listened to monophonic instrumental lullabies and infant-directed speech produced by a mother. To match acoustic variation between music and speech sounds we (1) recorded music from instruments that had a similar spectral range as female infant-directed speech, (2) used a novel excitation-matching algorithm to match the cochleagrams of music and speech stimuli, and (3) synthesized "model-matched" stimuli that were matched in spectrotemporal modulation statistics to (yet perceptually distinct from) music or speech. Of the 36 infants we collected usable data from, 19 had significant activations to sounds overall compared to scanner noise. From these infants, we observed a set of voxels in non-primary auditory cortex (NPAC) but not in Heschl's Gyrus that responded significantly more to music than to each of the other three stimulus types (but not significantly more strongly than to the background scanner noise). In contrast, our planned analyses did not reveal voxels in NPAC that responded more to speech than to model-matched speech, although other unplanned analyses did. These preliminary findings suggest that music selectivity arises within the first month of life. A video abstract of this article can be viewed at https://youtu.be/c8IGFvzxudk. RESEARCH HIGHLIGHTS: Responses to music, speech, and control sounds matched for the spectrotemporal modulation-statistics of each sound were measured from 2- to 11-week-old sleeping infants using fMRI. Auditory cortex was significantly activated by these stimuli in 19 out of 36 sleeping infants. Selective responses to music compared to the three other stimulus classes were found in non-primary auditory cortex but not in nearby Heschl's Gyrus. Selective responses to speech were not observed in planned analyses but were observed in unplanned, exploratory analyses.
Collapse
Affiliation(s)
- Heather L Kosakowski
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| | | | - Anna Mynick
- Psychological and Brain Sciences, Dartmouth College, Hannover, New Hampshire, USA
| | - Atsushi Takahashi
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Rebecca Saxe
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| |
Collapse
|
80
|
Alviar C, Sahoo M, Edwards L, Jones W, Klin A, Lense M. Infant-directed song potentiates infants' selective attention to adults' mouths over the first year of life. Dev Sci 2023; 26:e13359. [PMID: 36527322 PMCID: PMC10276172 DOI: 10.1111/desc.13359] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/03/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022]
Abstract
The mechanisms by which infant-directed (ID) speech and song support language development in infancy are poorly understood, with most prior investigations focused on the auditory components of these signals. However, the visual components of ID communication are also of fundamental importance for language learning: over the first year of life, infants' visual attention to caregivers' faces during ID speech switches from a focus on the eyes to a focus on the mouth, which provides synchronous visual cues that support speech and language development. Caregivers' facial displays during ID song are highly effective for sustaining infants' attention. Here we investigate if ID song specifically enhances infants' attention to caregivers' mouths. 299 typically developing infants watched clips of female actors engaging them with ID song and speech longitudinally at six time points from 3 to 12 months of age while eye-tracking data was collected. Infants' mouth-looking significantly increased over the first year of life with a significantly greater increase during ID song versus speech. This difference was early-emerging (evident in the first 6 months of age) and sustained over the first year. Follow-up analyses indicated specific properties inherent to ID song (e.g., slower tempo, reduced rhythmic variability) in part contribute to infants' increased mouth-looking, with effects increasing with age. The exaggerated and expressive facial features that naturally accompany ID song may make it a particularly effective context for modulating infants' visual attention and supporting speech and language development in both typically developing infants and those with or at risk for communication challenges. A video abstract of this article can be viewed at https://youtu.be/SZ8xQW8h93A. RESEARCH HIGHLIGHTS: Infants' visual attention to adults' mouths during infant-directed speech has been found to support speech and language development. Infant-directed (ID) song promotes mouth-looking by infants to a greater extent than does ID speech across the first year of life. Features characteristic of ID song such as slower tempo, increased rhythmicity, increased audiovisual synchrony, and increased positive affect, all increase infants' attention to the mouth. The effects of song on infants' attention to the mouth are more prominent during the second half of the first year of life.
Collapse
Affiliation(s)
- Camila Alviar
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Manash Sahoo
- Marcus Autism Center, Children’s Healthcare of Atlanta, Atlanta, GA, USA
- Emory University School of Medicine, Atlanta, GA, USA
| | - Laura Edwards
- Marcus Autism Center, Children’s Healthcare of Atlanta, Atlanta, GA, USA
- Emory University School of Medicine, Atlanta, GA, USA
| | - Warren Jones
- Marcus Autism Center, Children’s Healthcare of Atlanta, Atlanta, GA, USA
- Emory University School of Medicine, Atlanta, GA, USA
| | - Ami Klin
- Marcus Autism Center, Children’s Healthcare of Atlanta, Atlanta, GA, USA
- Emory University School of Medicine, Atlanta, GA, USA
| | - Miriam Lense
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
- The Curb Center for Art, Enterprise, and Public Policy, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
81
|
Liu T, Martinez-Torres K, Mazzone J, Camarata S, Lense M. Brief Report: Telehealth Music-Enhanced Reciprocal Imitation Training in Autism: A Single-Subject Feasibility Study of a Virtual Parent Coaching Intervention. J Autism Dev Disord 2023:10.1007/s10803-023-06053-z. [PMID: 37530912 PMCID: PMC10834845 DOI: 10.1007/s10803-023-06053-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/22/2023] [Indexed: 08/03/2023]
Abstract
PURPOSE Telehealth delivery increases accessibility of parent-mediated interventions that teach parents skills and support autistic children's social communication. Reciprocal Imitation Training (RIT), an evidence-based Naturalistic Developmental Behavioral Intervention (NDBI) focused on imitation skills, a common difficulty in autism, holds promise for telehealth-based parent training. Imitation is also a core component of musical play during childhood and the affordances of musical play/song naturally shape parent-child interactions. We evaluate the feasibility of a music-based, telehealth adaptation of RIT-music-enhanced RIT (tele-meRIT)-as a novel format for coaching parents in NDBI strategies. METHODS This single-subject, multiple baseline design study included 4 autistic children (32-53 months old) and their mothers. Parent-child dyads were recorded during 10-min free play probes at baseline, weekly tele-meRIT sessions, and one-week and one-month follow-up. Probes were coded for parents' RIT implementation fidelity, parent vocal musicality, and children's rate of spontaneous imitation. RESULTS No parent demonstrated implementation fidelity during baseline. All parents increased their use of RIT strategies, met fidelity by the end of treatment, and maintained fidelity at follow-up. Parent vocal musicality also increased from baseline. Intervention did not consistently increase children's imitation skills. A post-intervention evaluation survey indicated high parent satisfaction with tele-meRIT and perceived benefits to their children's social and play skills more broadly. CONCLUSION Implementing tele-meRIT is feasible. Although tele-meRIT additionally involved coaching in incorporating rhythmicity and song into play interactions, parents achieved fidelity in the RIT principles, suggesting one avenue by which music can be integrated within evidence-based parent-mediated NDBIs.
Collapse
Affiliation(s)
- Talia Liu
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA.
- Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Keysha Martinez-Torres
- Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Julie Mazzone
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephen Camarata
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Miriam Lense
- Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA.
- The Curb Center for Art, Enterprise, and Public Policy, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
82
|
Jia Z, Xu C, Li J, Gao J, Ding N, Luo B, Zou J. Phase Property of Envelope-Tracking EEG Response Is Preserved in Patients with Disorders of Consciousness. eNeuro 2023; 10:ENEURO.0130-23.2023. [PMID: 37500493 PMCID: PMC10420405 DOI: 10.1523/eneuro.0130-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/29/2023] Open
Abstract
When listening to speech, the low-frequency cortical response below 10 Hz can track the speech envelope. Previous studies have demonstrated that the phase lag between speech envelope and cortical response can reflect the mechanism by which the envelope-tracking response is generated. Here, we analyze whether the mechanism to generate the envelope-tracking response is modulated by the level of consciousness, by studying how the stimulus-response phase lag is modulated by the disorder of consciousness (DoC). It is observed that DoC patients in general show less reliable neural tracking of speech. Nevertheless, the stimulus-response phase lag changes linearly with frequency between 3.5 and 8 Hz, for DoC patients who show reliable cortical tracking to speech, regardless of the consciousness state. The mean phase lag is also consistent across these DoC patients. These results suggest that the envelope-tracking response to speech can be generated by an automatic process that is barely modulated by the consciousness state.
Collapse
Affiliation(s)
- Ziting Jia
- The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250033, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310019, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310003, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
83
|
Meng K, Goodarzy F, Kim E, Park YJ, Kim JS, Cook MJ, Chung CK, Grayden DB. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J Neural Eng 2023; 20:046019. [PMID: 37459853 DOI: 10.1088/1741-2552/ace7f6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Objective. Brain-computer interfaces can restore various forms of communication in paralyzed patients who have lost their ability to articulate intelligible speech. This study aimed to demonstrate the feasibility of closed-loop synthesis of artificial speech sounds from human cortical surface recordings during silent speech production.Approach. Ten participants with intractable epilepsy were temporarily implanted with intracranial electrode arrays over cortical surfaces. A decoding model that predicted audible outputs directly from patient-specific neural feature inputs was trained during overt word reading and immediately tested with overt, mimed and imagined word reading. Predicted outputs were later assessed objectively against corresponding voice recordings and subjectively through human perceptual judgments.Main results. Artificial speech sounds were successfully synthesized during overt and mimed utterances by two participants with some coverage of the precentral gyrus. About a third of these sounds were correctly identified by naïve listeners in two-alternative forced-choice tasks. A similar outcome could not be achieved during imagined utterances by any of the participants. However, neural feature contribution analyses suggested the presence of exploitable activation patterns during imagined speech in the postcentral gyrus and the superior temporal gyrus. In future work, a more comprehensive coverage of cortical surfaces, including posterior parts of the middle frontal gyrus and the inferior frontal gyrus, could improve synthesis performance during imagined speech.Significance.As the field of speech neuroprostheses is rapidly moving toward clinical trials, this study addressed important considerations about task instructions and brain coverage when conducting research on silent speech with non-target participants.
Collapse
Affiliation(s)
- Kevin Meng
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Farhad Goodarzy
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - EuiYoung Kim
- Interdisciplinary Program in Neuroscience, Seoul National University, Seoul, Republic of Korea
| | - Ye Jin Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
| | - June Sic Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark J Cook
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - Chun Kee Chung
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
- Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - David B Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
84
|
Deoisres S, Lu Y, Vanheusden FJ, Bell SL, Simpson DM. Continuous speech with pauses inserted between words increases cortical tracking of speech envelope. PLoS One 2023; 18:e0289288. [PMID: 37498891 PMCID: PMC10374040 DOI: 10.1371/journal.pone.0289288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 07/17/2023] [Indexed: 07/29/2023] Open
Abstract
The decoding multivariate Temporal Response Function (decoder) or speech envelope reconstruction approach is a well-known tool for assessing the cortical tracking of speech envelope. It is used to analyse the correlation between the speech stimulus and the neural response. It is known that auditory late responses are enhanced with longer gaps between stimuli, but it is not clear if this applies to the decoder, and whether the addition of gaps/pauses in continuous speech could be used to increase the envelope reconstruction accuracy. We investigated this in normal hearing participants who listened to continuous speech with no added pauses (natural speech), and then with short (250 ms) or long (500 ms) silent pauses inserted between each word. The total duration for continuous speech stimulus with no, short, and long pauses were approximately, 10 minutes, 16 minutes, and 21 minutes, respectively. EEG and speech envelope were simultaneously acquired and then filtered into delta (1-4 Hz) and theta (4-8 Hz) frequency bands. In addition to analysing responses to the whole speech envelope, speech envelope was also segmented to focus response analysis on onset and non-onset regions of speech separately. Our results show that continuous speech with additional pauses inserted between words significantly increases the speech envelope reconstruction correlations compared to using natural speech, in both the delta and theta frequency bands. It also appears that these increase in speech envelope reconstruction are dominated by the onset regions in the speech envelope. Introducing pauses in speech stimuli has potential clinical benefit for increasing auditory evoked response detectability, though with the disadvantage of speech sounding less natural. The strong effect of pauses and onsets on the decoder should be considered when comparing results from different speech corpora. Whether the increased cortical response, when longer pauses are introduced, reflect improved intelligibility requires further investigation.
Collapse
Affiliation(s)
- Suwijak Deoisres
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Yuhan Lu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Frederique J Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
| | - Steven L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
85
|
Abbasi O, Steingräber N, Chalas N, Kluger DS, Gross J. Spatiotemporal dynamics characterise spectral connectivity profiles of continuous speaking and listening. PLoS Biol 2023; 21:e3002178. [PMID: 37478152 DOI: 10.1371/journal.pbio.3002178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/31/2023] [Indexed: 07/23/2023] Open
Abstract
Speech production and perception are fundamental processes of human cognition that both rely on intricate processing mechanisms that are still poorly understood. Here, we study these processes by using magnetoencephalography (MEG) to comprehensively map connectivity of regional brain activity within the brain and to the speech envelope during continuous speaking and listening. Our results reveal not only a partly shared neural substrate for both processes but also a dissociation in space, delay, and frequency. Neural activity in motor and frontal areas is coupled to succeeding speech in delta band (1 to 3 Hz), whereas coupling in the theta range follows speech in temporal areas during speaking. Neural connectivity results showed a separation of bottom-up and top-down signalling in distinct frequency bands during speaking. Here, we show that frequency-specific connectivity channels for bottom-up and top-down signalling support continuous speaking and listening. These findings further shed light on the complex interplay between different brain regions involved in speech production and perception.
Collapse
Affiliation(s)
- Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nadine Steingräber
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
86
|
Nocon JC, Gritton HJ, James NM, Mount RA, Qu Z, Han X, Sen K. Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Commun Biol 2023; 6:751. [PMID: 37468561 PMCID: PMC10356822 DOI: 10.1038/s42003-023-05126-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
Cortical representations supporting many cognitive abilities emerge from underlying circuits comprised of several different cell types. However, cell type-specific contributions to rate and timing-based cortical coding are not well-understood. Here, we investigated the role of parvalbumin neurons in cortical complex scene analysis. Many complex scenes contain sensory stimuli which are highly dynamic in time and compete with stimuli at other spatial locations. Parvalbumin neurons play a fundamental role in balancing excitation and inhibition in cortex and sculpting cortical temporal dynamics; yet their specific role in encoding complex scenes via timing-based coding, and the robustness of temporal representations to spatial competition, has not been investigated. Here, we address these questions in auditory cortex of mice using a cocktail party-like paradigm, integrating electrophysiology, optogenetic manipulations, and a family of spike-distance metrics, to dissect parvalbumin neurons' contributions towards rate and timing-based coding. We find that suppressing parvalbumin neurons degrades cortical discrimination of dynamic sounds in a cocktail party-like setting via changes in rapid temporal modulations in rate and spike timing, and over a wide range of time-scales. Our findings suggest that parvalbumin neurons play a critical role in enhancing cortical temporal coding and reducing cortical noise, thereby improving representations of dynamic stimuli in complex scenes.
Collapse
Affiliation(s)
- Jian Carlo Nocon
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Howard J Gritton
- Department of Comparative Biosciences, University of Illinois, Urbana, 61820, IL, USA
- Department of Bioengineering, University of Illinois, Urbana, 61820, IL, USA
| | - Nicholas M James
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Rebecca A Mount
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Zhili Qu
- Department of Comparative Biosciences, University of Illinois, Urbana, 61820, IL, USA
- Department of Bioengineering, University of Illinois, Urbana, 61820, IL, USA
| | - Xue Han
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Kamal Sen
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA.
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA.
- Hearing Research Center, Boston University, Boston, 02215, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA.
| |
Collapse
|
87
|
Gunasekaran H, Azizi L, van Wassenhove V, Herbst SK. Characterizing endogenous delta oscillations in human MEG. Sci Rep 2023; 13:11031. [PMID: 37419933 PMCID: PMC10328979 DOI: 10.1038/s41598-023-37514-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 06/22/2023] [Indexed: 07/09/2023] Open
Abstract
Rhythmic activity in the delta frequency range (0.5-3 Hz) is a prominent feature of brain dynamics. Here, we examined whether spontaneous delta oscillations, as found in invasive recordings in awake animals, can be observed in non-invasive recordings performed in humans with magnetoencephalography (MEG). In humans, delta activity is commonly reported when processing rhythmic sensory inputs, with direct relationships to behaviour. However, rhythmic brain dynamics observed during rhythmic sensory stimulation cannot be interpreted as an endogenous oscillation. To test for endogenous delta oscillations we analysed human MEG data during rest. For comparison, we additionally analysed two conditions in which participants engaged in spontaneous finger tapping and silent counting, arguing that internally rhythmic behaviours could incite an otherwise silent neural oscillator. A novel set of analysis steps allowed us to show narrow spectral peaks in the delta frequency range in rest, and during overt and covert rhythmic activity. Additional analyses in the time domain revealed that only the resting state condition warranted an interpretation of these peaks as endogenously periodic neural dynamics. In sum, this work shows that using advanced signal processing techniques, it is possible to observe endogenous delta oscillations in non-invasive recordings of human brain dynamics.
Collapse
Affiliation(s)
- Harish Gunasekaran
- Cognitive Neuroimaging Unit, NeuroSpin, CEA, INSERM, CNRS, Université Paris-Saclay, 91191, Gif/Yvette, France
| | - Leila Azizi
- Cognitive Neuroimaging Unit, NeuroSpin, CEA, INSERM, CNRS, Université Paris-Saclay, 91191, Gif/Yvette, France
| | - Virginie van Wassenhove
- Cognitive Neuroimaging Unit, NeuroSpin, CEA, INSERM, CNRS, Université Paris-Saclay, 91191, Gif/Yvette, France
| | - Sophie K Herbst
- Cognitive Neuroimaging Unit, NeuroSpin, CEA, INSERM, CNRS, Université Paris-Saclay, 91191, Gif/Yvette, France.
| |
Collapse
|
88
|
Roman IR, Roman AS, Kim JC, Large EW. Hebbian learning with elasticity explains how the spontaneous motor tempo affects music performance synchronization. PLoS Comput Biol 2023; 19:e1011154. [PMID: 37285380 DOI: 10.1371/journal.pcbi.1011154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 05/02/2023] [Indexed: 06/09/2023] Open
Abstract
A musician's spontaneous rate of movement, called spontaneous motor tempo (SMT), can be measured while spontaneously playing a simple melody. Data shows that the SMT influences the musician's tempo and synchronization. In this study we present a model that captures these phenomena. We review the results from three previously-published studies: solo musical performance with a pacing metronome tempo that is different from the SMT, solo musical performance without a metronome at a tempo that is faster or slower than the SMT, and duet musical performance between musicians with matching or mismatching SMTs. These studies showed, respectively, that the asynchrony between the pacing metronome and the musician's tempo grew as a function of the difference between the metronome tempo and the musician's SMT, musicians drifted away from the initial tempo toward the SMT, and the absolute asynchronies were smaller if musicians had matching SMTs. We hypothesize that the SMT constantly acts as a pulling force affecting musical actions at a tempo different from a musician's SMT. To test our hypothesis, we developed a model consisting of a non-linear oscillator with Hebbian tempo learning and a pulling force to the model's spontaneous frequency. While the model's spontaneous frequency emulates the SMT, elastic Hebbian learning allows for frequency learning to match a stimulus' frequency. To test our hypothesis, we first fit model parameters to match the data in the first of the three studies and asked whether this same model would explain the data the remaining two studies without further tuning. Results showed that the model's dynamics allowed it to explain all three experiments with the same set of parameters. Our theory offers a dynamical-systems explanation of how an individual's SMT affects synchronization in realistic music performance settings, and the model also enables predictions about performance settings not yet tested.
Collapse
Affiliation(s)
- Iran R Roman
- Center for Computer Research in Music and Acoustics, Department of Music, Stanford University, Stanford, California, United States of America
| | - Adrian S Roman
- Department of Mathematics, University of California Davis, Davis, California, United States of America
| | - Ji Chul Kim
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Edward W Large
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Physics, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
89
|
Pomper U. No evidence for tactile entrainment of attention. Front Psychol 2023; 14:1168428. [PMID: 37303888 PMCID: PMC10250593 DOI: 10.3389/fpsyg.2023.1168428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 05/11/2023] [Indexed: 06/13/2023] Open
Abstract
Temporal patterns in our environment provide a rich source of information, to which endogenous neural processes linked to perception and attention can synchronize. This phenomenon, known as entrainment, has so far been studied predominately in the visual and auditory domains. It is currently unknown whether sensory phase-entrainment generalizes to the tactile modality, e.g., for the perception of surface patterns or when reading braille. Here, we address this open question via a behavioral experiment with preregistered experimental and analysis protocols. Twenty healthy participants were presented, on each trial, with 2 s of either rhythmic or arrhythmic 10 Hz tactile stimuli. Their task was to detect a subsequent tactile target either in-phase or out-of-phase with the rhythmic entrainment. Contrary to our hypothesis, we observed no evidence for sensory entrainment in response times, sensitivity or response bias. In line with several other recently reported null findings, our data suggest that behaviorally relevant sensory phase-entrainment might require very specific stimulus parameters, and may not generalize to the tactile domain.
Collapse
|
90
|
He D, Buder EH, Bidelman GM. Effects of Syllable Rate on Neuro-Behavioral Synchronization Across Modalities: Brain Oscillations and Speech Productions. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:344-360. [PMID: 37229510 PMCID: PMC10205147 DOI: 10.1162/nol_a_00102] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/25/2023] [Indexed: 05/27/2023]
Abstract
Considerable work suggests the dominant syllable rhythm of the acoustic envelope is remarkably similar across languages (∼4-5 Hz) and that oscillatory brain activity tracks these quasiperiodic rhythms to facilitate speech processing. However, whether this fundamental periodicity represents a common organizing principle in both auditory and motor systems involved in speech has not been explicitly tested. To evaluate relations between entrainment in the perceptual and production domains, we measured individuals' (i) neuroacoustic tracking of the EEG to speech trains and their (ii) simultaneous and non-simultaneous productions synchronized to syllable rates between 2.5 and 8.5 Hz. Productions made without concurrent auditory presentation isolated motor speech functions more purely. We show that neural synchronization flexibly adapts to the heard stimuli in a rate-dependent manner, but that phase locking is boosted near ∼4.5 Hz, the purported dominant rate of speech. Cued speech productions (recruit sensorimotor interaction) were optimal between 2.5 and 4.5 Hz, suggesting a low-frequency constraint on motor output and/or sensorimotor integration. In contrast, "pure" motor productions (without concurrent sound cues) were most precisely generated at rates of 4.5 and 5.5 Hz, paralleling the neuroacoustic data. Correlations further revealed strong links between receptive (EEG) and production synchronization abilities; individuals with stronger auditory-perceptual entrainment better matched speech rhythms motorically. Together, our findings support an intimate link between exogenous and endogenous rhythmic processing that is optimized at 4-5 Hz in both auditory and motor systems. Parallels across modalities could result from dynamics of the speech motor system coupled with experience-dependent tuning of the perceptual system via the sensorimotor interface.
Collapse
Affiliation(s)
- Deling He
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Eugene H. Buder
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
| |
Collapse
|
91
|
Bellur A, Thakkar K, Elhilali M. Explicit-memory multiresolution adaptive framework for speech and music separation. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2023; 2023:20. [PMID: 37181589 PMCID: PMC10169896 DOI: 10.1186/s13636-023-00286-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/21/2023] [Indexed: 05/16/2023]
Abstract
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement of selectivity of a particular sound object amidst dynamic backgrounds. The present study proposes a unified end-to-end computational framework that mimics these principles for sound source separation applied to both speech and music mixtures. While the problems of speech enhancement and music separation have often been tackled separately due to constraints and specificities of each signal domain, the current work posits that common principles for sound source separation are domain-agnostic. In the proposed scheme, parallel and hierarchical convolutional paths map input mixtures onto redundant but distributed higher-dimensional subspaces and utilize the concept of temporal coherence to gate the selection of embeddings belonging to a target stream abstracted in memory. These explicit memories are further refined through self-feedback from incoming observations in order to improve the system's selectivity when faced with unknown backgrounds. The model yields stable outcomes of source separation for both speech and music mixtures and demonstrates benefits of explicit memory as a powerful representation of priors that guide information selection from complex inputs.
Collapse
Affiliation(s)
- Ashwin Bellur
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Karan Thakkar
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Mounya Elhilali
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
92
|
Lu Y, Jin P, Ding N, Tian X. Delta-band neural tracking primarily reflects rule-based chunking instead of semantic relatedness between words. Cereb Cortex 2023; 33:4448-4458. [PMID: 36124831 PMCID: PMC10110438 DOI: 10.1093/cercor/bhac354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 08/12/2022] [Accepted: 08/13/2022] [Indexed: 11/14/2022] Open
Abstract
It is debated whether cortical responses matching the time scales of phrases and sentences mediate the mental construction of the syntactic chunks or are simply caused by the semantic properties of words. Here, we investigate to what extent delta-band neural responses to speech can be explained by semantic relatedness between words. To dissociate the contribution of semantic relatedness from sentential structures, participants listened to sentence sequences and paired-word sequences in which semantically related words repeated at 1 Hz. Semantic relatedness in the 2 types of sequences was quantified using a word2vec model that captured the semantic relation between words without considering sentential structure. The word2vec model predicted comparable 1-Hz responses with paired-word sequences and sentence sequences. However, empirical neural activity, recorded using magnetoencephalography, showed a weaker 1-Hz response to paired-word sequences than sentence sequences in a word-level task that did not require sentential processing. Furthermore, when listeners applied a task-related rule to parse paired-word sequences into multi-word chunks, 1-Hz response was stronger than that in word-level task on the same sequences. Our results suggest that cortical activity tracks multi-word chunks constructed by either syntactic rules or task-related rules, whereas the semantic relatedness between words contributes only in a minor way.
Collapse
Affiliation(s)
- Yuhan Lu
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
- NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai, Shanghai 200062, China
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Peiqing Jin
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Xing Tian
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
- NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai, Shanghai 200062, China
- Division of Arts and Sciences, New York University Shanghai
| |
Collapse
|
93
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
94
|
Kösem A, Dai B, McQueen JM, Hagoort P. Neural tracking of speech envelope does not unequivocally reflect intelligibility. Neuroimage 2023; 272:120040. [PMID: 36935084 DOI: 10.1016/j.neuroimage.2023.120040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 03/13/2023] [Accepted: 03/15/2023] [Indexed: 03/19/2023] Open
Abstract
During listening, brain activity tracks the rhythmic structures of speech signals. Here, we directly dissociated the contribution of neural envelope tracking in the processing of speech acoustic cues from that related to linguistic processing. We examined the neural changes associated with the comprehension of Noise-Vocoded (NV) speech using magnetoencephalography (MEG). Participants listened to NV sentences in a 3-phase training paradigm: (1) pre-training, where NV stimuli were barely comprehended, (2) training with exposure of the original clear version of speech stimulus, and (3) post-training, where the same stimuli gained intelligibility from the training phase. Using this paradigm, we tested if the neural responses of a speech signal was modulated by its intelligibility without any change in its acoustic structure. To test the influence of spectral degradation on neural envelope tracking independently of training, participants listened to two types of NV sentences (4-band and 2-band NV speech), but were only trained to understand 4-band NV speech. Significant changes in neural tracking were observed in the delta range in relation to the acoustic degradation of speech. However, we failed to find a direct effect of intelligibility on the neural tracking of speech envelope in both theta and delta ranges, in both auditory regions-of-interest and whole-brain sensor-space analyses. This suggests that acoustics greatly influence the neural tracking response to speech envelope, and that caution needs to be taken when choosing the control signals for speech-brain tracking analyses, considering that a slight change in acoustic parameters can have strong effects on the neural tracking response.
Collapse
Affiliation(s)
- Anne Kösem
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Lyon Neuroscience Research Center (CRNL), CoPhy Team, INSERM U1028, 69500 Bron, France.
| | - Bohan Dai
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - James M McQueen
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
95
|
Lubinus C, Keitel A, Obleser J, Poeppel D, Rimmele JM. Explaining flexible continuous speech comprehension from individual motor rhythms. Proc Biol Sci 2023; 290:20222410. [PMID: 36855868 PMCID: PMC9975658 DOI: 10.1098/rspb.2022.2410] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023] Open
Abstract
When speech is too fast, the tracking of the acoustic signal along the auditory pathway deteriorates, leading to suboptimal speech segmentation and decoding of speech information. Thus, speech comprehension is limited by the temporal constraints of the auditory system. Here we ask whether individual differences in auditory-motor coupling strength in part shape these temporal constraints. In two behavioural experiments, we characterize individual differences in the comprehension of naturalistic speech as function of the individual synchronization between the auditory and motor systems and the preferred frequencies of the systems. Obviously, speech comprehension declined at higher speech rates. Importantly, however, both higher auditory-motor synchronization and higher spontaneous speech motor production rates were predictive of better speech-comprehension performance. Furthermore, performance increased with higher working memory capacity (digit span) and higher linguistic, model-based sentence predictability-particularly so at higher speech rates and for individuals with high auditory-motor synchronization. The data provide evidence for a model of speech comprehension in which individual flexibility of not only the motor system but also auditory-motor synchronization may play a modulatory role.
Collapse
Affiliation(s)
- Christina Lubinus
- Department of Neuroscience and Department of Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, 60322 Frankfurt am Main, Germany
| | - Anne Keitel
- Psychology, University of Dundee, Dundee DD1 4HN, UK
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA
- Ernst Strüngmann Institute for Neuroscience (in Cooperation with Max Planck Society), Frankfurt am Main, Germany
| | - Johanna M. Rimmele
- Department of Neuroscience and Department of Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, 60322 Frankfurt am Main, Germany
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA
| |
Collapse
|
96
|
Wohltjen S, Toth B, Boncz A, Wheatley T. Synchrony to a beat predicts synchrony with other minds. Sci Rep 2023; 13:3591. [PMID: 36869056 PMCID: PMC9984464 DOI: 10.1038/s41598-023-29776-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 02/10/2023] [Indexed: 03/05/2023] Open
Abstract
Synchrony has been used to describe simple beat entrainment as well as correlated mental processes between people, leading some to question whether the term conflates distinct phenomena. Here we ask whether simple synchrony (beat entrainment) predicts more complex attentional synchrony, consistent with a common mechanism. While eye-tracked, participants listened to regularly spaced tones and indicated changes in volume. Across multiple sessions, we found a reliable individual difference: some people entrained their attention more than others, as reflected in beat-matched pupil dilations that predicted performance. In a second study, eye-tracked participants completed the beat task and then listened to a storyteller, who had been previously recorded while eye-tracked. An individual's tendency to entrain to a beat predicted how strongly their pupils synchronized with those of the storyteller, a corollary of shared attention. The tendency to synchronize is a stable individual difference that predicts attentional synchrony across contexts and complexity.
Collapse
Affiliation(s)
- Sophie Wohltjen
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA.
- Psychology Department, University of Wisconsin, 1202 West Johnson St. Madison, Madison, WI, 53706, USA.
| | - Brigitta Toth
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest, 1117, Hungary
| | - Adam Boncz
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest, 1117, Hungary
| | - Thalia Wheatley
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
97
|
Giroud J, Lerousseau JP, Pellegrino F, Morillon B. The channel capacity of multilevel linguistic features constrains speech comprehension. Cognition 2023; 232:105345. [PMID: 36462227 DOI: 10.1016/j.cognition.2022.105345] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 09/28/2022] [Accepted: 11/22/2022] [Indexed: 12/05/2022]
Abstract
Humans are expert at processing speech but how this feat is accomplished remains a major question in cognitive neuroscience. Capitalizing on the concept of channel capacity, we developed a unified measurement framework to investigate the respective influence of seven acoustic and linguistic features on speech comprehension, encompassing acoustic, sub-lexical, lexical and supra-lexical levels of description. We show that comprehension is independently impacted by all these features, but at varying degrees and with a clear dominance of the syllabic rate. Comparing comprehension of French words and sentences further reveals that when supra-lexical contextual information is present, the impact of all other features is dramatically reduced. Finally, we estimated the channel capacity associated with each linguistic feature and compared them with their generic distribution in natural speech. Our data reveal that while acoustic modulation, syllabic and phonemic rates unfold respectively at 5, 5, and 12 Hz in natural speech, they are associated with independent processing bottlenecks whose channel capacity are of 15, 15 and 35 Hz, respectively, as suggested by neurophysiological theories. They moreover point towards supra-lexical contextual information as the feature limiting the flow of natural speech. Overall, this study reveals how multilevel linguistic features constrain speech comprehension.
Collapse
Affiliation(s)
- Jérémy Giroud
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France.
| | | | - François Pellegrino
- Laboratoire Dynamique du Langage UMR 5596, CNRS, University of Lyon, 14 Avenue Berthelot, 69007 Lyon, France
| | - Benjamin Morillon
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France
| |
Collapse
|
98
|
He S, Skidmore J, Koch B, Chatterjee M, Carter BL, Yuan Y. Relationships Between the Auditory Nerve Sensitivity to Amplitude Modulation, Perceptual Amplitude Modulation Rate Discrimination Sensitivity, and Speech Perception Performance in Postlingually Deafened Adult Cochlear Implant Users. Ear Hear 2023; 44:371-384. [PMID: 36342278 PMCID: PMC9957802 DOI: 10.1097/aud.0000000000001289] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE This study assessed the relationships between the salience of amplitude modulation (AM) cues encoded at the auditory nerve (AN), perceptual sensitivity to changes in AM rate (i.e., AM rate discrimination threshold, AMRDT), and speech perception scores in postlingually deafened adult cochlear implant (CI) users. DESIGN Study participants were 18 postlingually deafened adults with Cochlear Nucleus devices, including five bilaterally implanted patients. For each of 23 implanted ears, neural encoding of AM cues at 20 Hz at the AN was evaluated at seven electrode locations across the electrode array using electrophysiological measures of the electrically evoked compound action potential (eCAP). The salience of AM neural encoding was quantified by the Modulated Response Amplitude Ratio (MRAR). Psychophysical measures of AMRDT for 20 Hz modulation were evaluated in 16 ears using a three-alternative, forced-choice procedure, targeting 79.4% correct on the psychometric function. AMRDT was measured at up to five electrode locations for each test ear, including the electrode pair that showed the largest difference in the MRAR. Consonant-Nucleus-Consonant (CNC) word scores presented in quiet and in speech-shaped noise at a signal to noise ratio (SNR) of +10 dB were measured in all 23 implanted ears. Simulation tests were used to assess the variations in correlation results when using the MRAR and AMRDT measured at only one electrode location in each participant to correlate with CNC word scores. Linear Mixed Models (LMMs) were used to evaluate the relationship between MRARs/AMRDTs measured at individual electrode locations and CNC word scores. Spearman Rank correlation tests were used to evaluate the strength of association between CNC word scores measured in quiet and in noise with (1) the variances in MRARs and AMRDTs, and (2) the averaged MRAR or AMRDT across multiple electrodes tested for each participant. RESULTS There was no association between the MRAR and AMRDT. Using the MRAR and AMRDT measured at only one, randomly selected electrode location to assess their associations with CNC word scores could lead to opposite conclusions. Both the results of LMMs and Spearman Rank correlation tests showed that CNC word scores measured in quiet or at 10 dB SNR were not significantly correlated with the MRAR or AMRDT. In addition, the results of Spearman Rank correlation tests showed that the variances in MRARs and AMRDTs were not significantly correlated with CNC word scores measured in quiet or in noise. CONCLUSIONS The difference in AN sensitivity to AM cues is not the primary factor accounting for the variation in AMRDTs measured at different stimulation sites within individual CI users. The AN sensitivity to AM per se may not be a crucial factor for CNC word perception in quiet or at 10 dB SNR in postlingually deafened adult CI users. Using electrophysiological or psychophysical results measured at only one electrode location to correlate with speech perception scores in CI users can lead to inaccurate, if not wrong, conclusions.
Collapse
Affiliation(s)
- Shuman He
- Department of Otolaryngology – Head and Neck Surgery, College of Medicine, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212
- Department of Audiology, Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH 43205
| | - Jeffrey Skidmore
- Department of Otolaryngology – Head and Neck Surgery, College of Medicine, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212
| | - Brandon Koch
- Division of Biostatistics, College of Public Health, The Ohio State University, 1841 Neil Avenue, Columbus, OH 43210
| | - Monita Chatterjee
- Boys Town National Research Hospital, 555 N 30 Street, Omaha, NE 68131
| | - Brittney L. Carter
- Department of Otolaryngology – Head and Neck Surgery, College of Medicine, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212
| | - Yi Yuan
- Department of Otolaryngology – Head and Neck Surgery, College of Medicine, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212
| |
Collapse
|
99
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
100
|
Ding N, Gao J, Wang J, Sun W, Fang M, Liu X, Zhao H. Speech recognition in echoic environments and the effect of aging and hearing impairment. Hear Res 2023; 431:108725. [PMID: 36931021 DOI: 10.1016/j.heares.2023.108725] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/12/2023] [Accepted: 02/23/2023] [Indexed: 03/01/2023]
Abstract
Temporal modulations provide critical cues for speech recognition. When the temporal modulations are distorted by, e.g., reverberations, speech intelligibility drops, and the drop in speech intelligibility can be explained by the amount of distortions to the speech modulation spectrum, i.e., the spectrum of temporal modulations. Here, we test a condition in which speech is contaminated by a single echo. Speech is delayed by either 0.125 s or 0.25 s to create an echo, and these two conditions notch out the temporal modulations at 2 or 4 Hz, respectively. We evaluate how well young and older listeners can recognize such echoic speech. For young listeners, the speech recognition rate is not influenced by the echo, even when they are exposed to the first echoic sentence. For older listeners, the speech recognition rate drops to less than 60% when listening to the first echoic sentence, but rapidly recovers to above 75% with exposure to a few sentences. Further analyses reveal that both age and the hearing threshold influence the recognition of echoic speech for the older listeners. These results show that the recognition of echoic speech cannot be fully explained by distortions to the modulation spectrum, and suggest that the auditory system has mechanisms to effectively compensate the influence of single echoes.
Collapse
Affiliation(s)
- Nai Ding
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiaxin Gao
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jing Wang
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wenhui Sun
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Mingxuan Fang
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaoling Liu
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hua Zhao
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
| |
Collapse
|