1
|
Guo YJ, Hsieh IH. Enhanced Delta Band Neural Tracking of Degraded Fundamental Frequency Speech in Noisy Environments. J Cogn Neurosci 2025; 37:1216-1237. [PMID: 39869347 DOI: 10.1162/jocn_a_02302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Pitch variation of the fundamental frequency (F0) is critical to speech understanding, especially in noisy environments. Degrading the F0 contour reduces behaviorally measured speech intelligibility, posing greater challenges for tonal languages like Mandarin Chinese where the F0 pattern determines semantic meaning. However, neural tracking of Mandarin speech with degraded F0 information in noisy environments remains unclear. This study investigated neural envelope tracking of continuous Mandarin speech with three F0-flattening levels (original, flat-tone, and flat-all) under various signal-to-noise ratios (0, -9, and -12 dB). F0 contours were flattened at the word level for flat-tone and at the sentence level for flat-all Mandarin speech. Electroencephalography responses were indexed by the temporal response function in the delta (<4 Hz) and theta (4-8 Hz) frequency bands. Results show that delta-band envelope tracking is modulated by the degree of F0 flattening in a nonmonotonic manner. Notably, flat-tone Mandarin speech elicited the strongest envelope tracking compared with both original and flat-all speech, despite reduced F0 information. In contrast, the theta band, which primarily encodes speech signal-to-noise level, was not affected by F0 changes. In addition, listeners with better pitch-related music skills exhibited more efficient neural envelope speech tracking, despite being musically naive. These findings indicate that neural envelope tracking in the delta (but not theta) band is highly specific to F0 pitch variation and highlight the role of intrinsic musical skills for speech-in-noise benefits.
Collapse
Affiliation(s)
- Yu-Jyun Guo
- National Central University, Taoyuan City, Taiwan
| | - I-Hui Hsieh
- National Central University, Taoyuan City, Taiwan
| |
Collapse
|
2
|
Mishra SK, Aryal S, Patro C, Fu QJ. Extended High-Frequency Hearing Loss and Suprathreshold Auditory Processing: The Moderating Role of Auditory Working Memory. Ear Hear 2025:00003446-990000000-00438. [PMID: 40405354 DOI: 10.1097/aud.0000000000001677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2025]
Abstract
OBJECTIVES Natural sounds, including speech, contain temporal fluctuations, and hearing loss influences the coding of these temporal features. However, how subclinical hearing loss may influence temporal variations remains unclear. In listeners with normal audiograms, hearing loss above 8 kHz is indicative of basal cochlear damage and may signal the onset of cochlear dysfunction. This study examined a conceptual framework to investigate the relationship between extended high-frequency hearing and suprathreshold auditory processing, particularly focusing on how cognitive factors, such as working memory, moderate these interactions. DESIGN Frequency modulation difference limens to slow (2 Hz) and fast (20 Hz) modulations, backward masking thresholds, and digit span measures were obtained in 44 normal-hearing listeners with varying degrees of extended high-frequency hearing thresholds. RESULTS Extended high-frequency hearing thresholds alone were not directly associated with frequency modulation difference limens or backward masking thresholds. However, working memory capacity-particularly as measured by the backward digit span-moderated the relationship between extended high-frequency thresholds and suprathreshold auditory performance. Among individuals with lower working memory capacity, elevated extended high-frequency thresholds were associated with reduced sensitivity to fast-rate frequency modulations and higher backward masking thresholds. It is important to note that this moderating effect was task-specific, as it was not observed for slow-rate modulations. CONCLUSIONS The impact of elevated extended high-frequency thresholds on suprathreshold auditory processing is influenced by working memory capacity. Individuals with reduced cognitive capacity are particularly vulnerable to the perceptual effects of subclinical cochlear damage. This suggests that cognitive resources act as a compensatory mechanism, helping to mitigate the effects of subclinical deficits, especially in tasks that are temporally challenging.
Collapse
Affiliation(s)
- Srikanta K Mishra
- Department of Speech, Language and Hearing Sciences, The University of Texas at Austin, Austin, Texas, USA
| | - Sajana Aryal
- Department of Speech, Language and Hearing Sciences, The University of Texas at Austin, Austin, Texas, USA
| | - Chhayakanta Patro
- Department of Speech Language Pathology & Audiology, Towson University, Towson, Maryland, USA
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
3
|
Descamps M, Grossard C, Pellerin H, Lechevalier C, Xavier J, Matos J, Vonthron F, Grosmaitre C, Habib M, Falissard B, Cohen D. Rhythm training improves word-reading in children with dyslexia. Sci Rep 2025; 15:17631. [PMID: 40399331 PMCID: PMC12095514 DOI: 10.1038/s41598-025-02485-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 05/13/2025] [Indexed: 05/23/2025] Open
Abstract
Specific learning disorder with reading deficit (SLD-reading), or dyslexia, is one of the most common neurodevelopmental disorders. Intensive training with reading specialists is recommended, but delayed access to care is common. In this study, we tested an innovative user-friendly and accessible intervention, the medical device Mila-Learn, which is a video game based on cognitive and rhythmic training to improve phonological and reading ability. This randomized placebo-controlled trial (ClinicalTrials.gov NCT05154721) included children aged 7 to 11 y/o with SLD-reading. The children were in 2nd to 5th grade and had been in school for more than 3 years. The exclusion criteria were reading or writing remediation during the past 12 months, previous use of Mila-Learn, and severe chronic illness. The patients, who were all native French speakers, were recruited from throughout France and were randomly assigned to Mila-Learn or a matched-placebo game for an 8-week training. The primary variable was nonword decoding. The secondary variables included phonological skills, 2-min word-reading accuracy and speed, grapheme-to-phoneme conversion skills, and self-esteem. Between September 2021 and April 2023, 151 children were assigned to Mila-Learn (n = 75; male = 36; female = 39) or the placebo (n = 76; male = 42; female = 34). We registered 39 adverse events; only one was due to the protocol and was in the placebo group. We found no differences between the groups in nonword decoding in the intention-to-treat (N = 151; p = 0.39) or per-protocol analysis (N = 93; p = 0.21). However, the per-protocol analysis revealed the superiority of Mila-Learn over the placebo by 5.05 score points (95% CI 0.21; 10.3, p < 0.05) for 2-minute word-reading accuracy and by 5.44 score points (95% CI 0.57; 10.99, p < 0.05) for 2-min word-reading speed. We found no other significant effects. Mila-Learn is safe for children with SLD-reading who might benefit from this medical device.Study registration: ClinicalTrials.gov NCT05154721-13/12/2021.
Collapse
Affiliation(s)
| | - Charline Grossard
- Institut des Systemes Intelligents et Robotiques, Sorbonne Université, Paris, France
- Service de Psychiatrie de l'Enfant et de l'Adolescent, APHP, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
| | - Hugues Pellerin
- Service de Psychiatrie de l'Enfant et de l'Adolescent, APHP, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
| | - Claire Lechevalier
- Pôle Universitaire de Psychiatrie de l'Enfant et de l'Adolescent, Centre Hospitalier Spécialisé Henri Laborit, Poitiers, France
- CNRS UMR 7295, Equipe CoCliCo, Centre de Recherches sur la Cognition et l'Apprentissage, Poitiers, France
| | - Jean Xavier
- Pôle Universitaire de Psychiatrie de l'Enfant et de l'Adolescent, Centre Hospitalier Spécialisé Henri Laborit, Poitiers, France
- CNRS UMR 7295, Equipe CoCliCo, Centre de Recherches sur la Cognition et l'Apprentissage, Poitiers, France
| | - Joana Matos
- Service de Psychiatrie de l'Enfant et de l'Adolescent, APHP, Groupe Hospitalier Pitié-Salpêtrière, Paris, France
| | | | - Catherine Grosmaitre
- Reference Centre of Language and Learning Disorders, Hôpital Necker-Enfants Malades, APHP, Paris, France
- Laboratoire DysCo, Université de Nanterre, Nanterre, Paris, France
| | - Michel Habib
- Cognitive Neuroscience Laboratory, Neurodys Institute, Aix-Marseille University, UMR 7291, Marseille, France
| | - Bruno Falissard
- Faculté de médecine Paris-Saclay, Le Kremlin-Bicêtre, France
- Centre de recherche en épidémiologie et santé des populations de l'INSERM, Paris, France
| | - David Cohen
- Institut des Systemes Intelligents et Robotiques, Sorbonne Université, Paris, France.
- Service de Psychiatrie de l'Enfant et de l'Adolescent, APHP, Groupe Hospitalier Pitié-Salpêtrière, Paris, France.
| |
Collapse
|
4
|
Auksztulewicz R, Ödül OB, Helbling S, Böke A, Cappotto D, Luo D, Schnupp J, Melloni L. "What" and "When" Predictions Jointly Modulate Speech Processing. J Neurosci 2025; 45:e1049242025. [PMID: 40216546 PMCID: PMC12079732 DOI: 10.1523/jneurosci.1049-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 02/07/2025] [Accepted: 03/03/2025] [Indexed: 05/16/2025] Open
Abstract
Adaptive behavior rests on predictions based on statistical regularities in the environment. Such regularities pertain to stimulus contents ("what") and timing ("when"), and both interactively modulate sensory processing. In speech streams, predictions can be formed at multiple hierarchical levels of contents (e.g., syllables vs words) and timing (faster vs slower time scales). Whether and how these hierarchies map onto each other remains unknown. Under one hypothesis, neural hierarchies may link "what" and "when" predictions within sensory processing areas: with lower versus higher cortical regions mediating interactions for smaller versus larger units (syllables vs words). Alternatively, interactions between "what" and "when" regularities might rest on a generic, sensory-independent mechanism. To address these questions, we manipulated "what" and "when" regularities at two levels-single syllables and disyllabic pseudowords-while recording neural activity using magnetoencephalography (MEG) in healthy volunteers (N = 22). We studied how neural responses to syllable and/or pseudoword deviants are modulated by "when" regularity. "When" regularity modulated "what" mismatch responses with hierarchical specificity, such that responses to deviant pseudowords (vs syllables) were amplified by temporal regularity at slower (vs faster) time scales. However, both these interactive effects were source-localized to the same regions, including frontal and parietal cortices. Effective connectivity analysis showed that the integration of "what" and "when" regularity selectively modulated connectivity within regions, consistent with gain effects. This suggests that the brain integrates "what" and "when" predictions that are congruent with respect to their hierarchical level, but this integration is mediated by a shared and distributed cortical network.
Collapse
Affiliation(s)
- Ryszard Auksztulewicz
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht 6229 ER, The Netherlands
- Centre for Cognitive Neuroscience Berlin, Freie Universität Berlin, Berlin 14195, Germany
| | - Ozan Bahattin Ödül
- Department of Brain and Behavioral Sciences, Università di Pavia, Pavia 27100, Italy
| | - Saskia Helbling
- Ernst Strungmann Institute, Frankfurt am Main 60528, Germany
| | - Ana Böke
- Centre for Cognitive Neuroscience Berlin, Freie Universität Berlin, Berlin 14195, Germany
| | - Drew Cappotto
- Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Dan Luo
- Department of Otorhinolaryngology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Jan Schnupp
- Gerald Choa Neuroscience Institute, Chinese University of Hong Kong, Hong Kong SAR, Hong Kong
- Department of Otorhinolaryngology, Head and Neck Surgery, Chinese University of Hong Kong, Hong Kong SAR, Hong Kong
| | - Lucía Melloni
- Research Group Neural Circuits, Consciousness and Cognition, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main 60322, Germany
- Predictive Brain Department, Research Center One Health Ruhr, Faculty of Psychology, University Alliance Ruhr, Ruhr University Bochum, Bochum 44801, Germany
| |
Collapse
|
5
|
Osorio S, Assaneo MF. Anatomically distinct cortical tracking of music and speech by slow (1-8Hz) and fast (70-120Hz) oscillatory activity. PLoS One 2025; 20:e0320519. [PMID: 40341725 PMCID: PMC12061428 DOI: 10.1371/journal.pone.0320519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 02/19/2025] [Indexed: 05/11/2025] Open
Abstract
Music and speech encode hierarchically organized structural complexity at the service of human expressiveness and communication. Previous research has shown that populations of neurons in auditory regions track the envelope of acoustic signals within the range of slow and fast oscillatory activity. However, the extent to which cortical tracking is influenced by the interplay between stimulus type, frequency band, and brain anatomy remains an open question. In this study, we reanalyzed intracranial recordings from thirty subjects implanted with electrocorticography (ECoG) grids in the left cerebral hemisphere, drawn from an existing open-access ECoG database. Participants passively watched a movie where visual scenes were accompanied by either music or speech stimuli. Cross-correlation between brain activity and the envelope of music and speech signals, along with density-based clustering analyses and linear mixed-effects modeling, revealed both anatomically overlapping and functionally distinct mapping of the tracking effect as a function of stimulus type and frequency band. We observed widespread left-hemisphere tracking of music and speech signals in the Slow Frequency Band (SFB, band-passed filtered low-frequency signal between 1-8Hz), with near zero temporal lags. In contrast, cortical tracking in the High Frequency Band (HFB, envelope of the 70-120Hz band-passed filtered signal) was higher during speech perception, was more densely concentrated in classical language processing areas, and showed a frontal-to-temporal gradient in lag values that was not observed during perception of musical stimuli. Our results highlight a complex interaction between cortical region and frequency band that shapes temporal dynamics during processing of naturalistic music and speech signals.
Collapse
Affiliation(s)
- Sergio Osorio
- Department of Neurology, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | | |
Collapse
|
6
|
Regev J, Oxenham AJ, Relaño-Iborra H, Zaar J, Dau T. Evaluating the role of age on speech-in-noise perception based primarily on temporal envelope information. Hear Res 2025; 460:109236. [PMID: 40086130 DOI: 10.1016/j.heares.2025.109236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 02/21/2025] [Accepted: 02/27/2025] [Indexed: 03/16/2025]
Abstract
Acoustic amplitude modulation (AM) patterns carry important information, particularly in speech. AM masking, influenced by frequency selectivity in the modulation domain, is considered a crucial factor for speech intelligibility in noisy environments. Based on recent evidence suggesting an age-related decline in AM frequency selectivity, this study investigated whether increased AM masking in older listeners is associated with reduced speech intelligibility. Speech reception thresholds (SRTs) were measured using tone-vocoded speech and maskers with no modulation, broadband AM, or narrowband AM at varying modulation frequencies. AM masked thresholds were assessed for a 4-Hz target modulation frequency. The study included young (N = 14, 19-25 years) and older (N = 14, 57-79 years) listeners with normal hearing. It was hypothesized that SRTs would be higher for the older group with modulated maskers and that the age-related increase in SRT would depend on the masker's modulation frequency content. The speech intelligibility results showed that maskers with broadband AM produced higher SRTs than unmodulated maskers. However, SRTs varied little with masker-modulation center frequency across the range tested (2-32 Hz). While older listeners exhibited lower AM frequency selectivity than young listeners, they did not consistently exhibit higher SRTs than their young counterparts across maskers. However, there was a trend for the effect of age to be greater for maskers with broadband AM than for unmodulated maskers. Overall, despite supportive trends, the results do not conclusively demonstrate that older listeners are more susceptible than young listeners to AM masking of speech.
Collapse
Affiliation(s)
- Jonathan Regev
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| | - Andrew J Oxenham
- Auditory Perception and Cognition Laboratory, Department of Psychology, University of Minnesota, MN, 55455 Minneapolis, USA
| | - Helia Relaño-Iborra
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark; Eriksholm Research Centre, 3070 Snekkersten, Denmark; Department of Biomedical Engineering, University of Rochester, NY, 14642 Rochester, USA
| | - Johannes Zaar
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark; Eriksholm Research Centre, 3070 Snekkersten, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
7
|
Wang Y, Wu D, Ding N, Zou J, Lu Y, Ma Y, Zhang X, Yu W, Wang K. Linear phase property of speech envelope tracking response in Heschl's gyrus and superior temporal gyrus. Cortex 2025; 186:1-10. [PMID: 40138746 DOI: 10.1016/j.cortex.2025.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 01/17/2025] [Accepted: 02/27/2025] [Indexed: 03/29/2025]
Abstract
Understanding how the brain tracks speech during listening remains a challenge. The phase resetting hypothesis proposes that the envelope-tracking response is generated by resetting the phase of intrinsic nonlinear neural oscillations, whereas the evoked response hypothesis proposes that the envelope-tracking response is the linear superposition of transient responses evoked by a sequence of acoustic events in speech. Recent studies have demonstrated a linear phase property of the envelope-tracking response, supporting the evoked response hypothesis. However, the cortical regions aligning with the evoked response hypothesis remain unclear. To address this question, we directly recorded from the cortex using stereo-electroencephalography (SEEG) in nineteen epilepsy patients as they listened to natural speech, and we investigated whether the phase lag between the speech envelope and neural activity linearly changes across frequency. We found that the linear phase property of low-frequency (LF) (.5-40 Hz) envelope tracking was widely observed in Heschl's gyrus (HG) and superior temporal gyrus (STG), with additional sparser distribution in insula, postcentral gyrus, and precentral gyrus. Furthermore, the latency of LF envelope-tracking responses derived from phase-frequency curve exhibited an increase gradient along HG and in the posterior-to-anterior direction in STG. Our findings suggest that auditory cortex can track speech envelope in line with the evoked response hypothesis.
Collapse
Affiliation(s)
- Yaoyao Wang
- Research Center for Intelligent Computing Infrastructure Innovation, Zhejiang Lab, Hangzhou, 311121, China
| | - Dengchang Wu
- Epilepsy Center, Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, 310027, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, 310027, China
| | - Yuhan Lu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, 310027, China
| | - Yuehui Ma
- Epilepsy Center, Department of Neurosurgery, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China
| | - Xing Zhang
- Epilepsy Center, Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China
| | - Wenyuan Yu
- Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou, 311121, China; Mental Health Education, Consultation and Research Center, Shenzhen Polytechnic University, Shenzhen, 518055, China.
| | - Kang Wang
- Epilepsy Center, Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China.
| |
Collapse
|
8
|
Lorenz A, Mercier M, Trébuchon A, Bartolomei F, Schön D, Morillon B. Corollary discharge signals during production are domain general: An intracerebral EEG case study with a professional musician. Cortex 2025; 186:11-23. [PMID: 40147418 DOI: 10.1016/j.cortex.2025.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Revised: 02/04/2025] [Accepted: 02/18/2025] [Indexed: 03/29/2025]
Abstract
As measured by event-related potentials, self-produced sounds elicit an overall reduced response in the auditory cortex compared to identical externally presented stimuli. This study examines this modulatory effect with high-precision recordings in naturalistic settings and explores whether it is domain-general across speech or music. Using stereotactic EEG with a professional musician undergoing presurgical epilepsy evaluation, we recorded auditory cortical activity during music and speech production and perception tasks. Compared to externally presented sounds, self-produced sounds induce modulation of activity in the auditory cortex which vary across frequency and spatial location but is consistent across cognitive domains (speech/music) and different stimuli. Self-produced music and speech were associated with widespread low-frequency (4-8 Hz) suppression, mid-frequency (8-80 Hz) enhancement, and decreased encoding of acoustic features. These findings reveal the domain-general nature of motor-driven corollary discharge modulatory signals and their frequency-specific effects in auditory regions.
Collapse
Affiliation(s)
- Anna Lorenz
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France; Vanderbilt Memory & Alzheimer's Center, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Manuel Mercier
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Agnès Trébuchon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France; APHM, Clinical Neurophysiology, Timone Hospital, Marseille, France
| | - Fabrice Bartolomei
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France; APHM, Clinical Neurophysiology, Timone Hospital, Marseille, France
| | - Daniele Schön
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| |
Collapse
|
9
|
Harding EE, Kim JC, Demos AP, Roman IR, Tichko P, Palmer C, Large EW. Musical neurodynamics. Nat Rev Neurosci 2025; 26:293-307. [PMID: 40102614 DOI: 10.1038/s41583-025-00915-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2025] [Indexed: 03/20/2025]
Abstract
A great deal of research in the neuroscience of music suggests that neural oscillations synchronize with musical stimuli. Although neural synchronization is a well-studied mechanism underpinning expectation, it has even more far-reaching implications for music. In this Perspective, we survey the literature on the neuroscience of music, including pitch, harmony, melody, tonality, rhythm, metre, groove and affect. We describe how fundamental dynamical principles based on known neural mechanisms can explain basic aspects of music perception and performance, as summarized in neural resonance theory. Building on principles such as resonance, stability, attunement and strong anticipation, we propose that people anticipate musical events not through predictive neural models, but because brain-body dynamics physically embody musical structure. The interaction of certain kinds of sounds with ongoing pattern-forming dynamics results in patterns of perception, action and coordination that we collectively experience as music. Statistically universal structures may have arisen in music because they correspond to stable states of complex, pattern-forming dynamical systems. This analysis of empirical findings from the perspective of neurodynamic principles sheds new light on the neuroscience of music and what makes music powerful.
Collapse
Affiliation(s)
- Eleanor E Harding
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Center for Language and Cognition, University of Groningen, Groningen, The Netherlands
| | - Ji Chul Kim
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Alexander P Demos
- Department of Psychology, University of Illinois Chicago, Chicago, IL, USA
| | - Iran R Roman
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
| | - Parker Tichko
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Caroline Palmer
- Department of Psychology, McGill University, Montreal, Quebec, Canada
| | - Edward W Large
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA.
- Department of Physics, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
10
|
Preisig BC, Meyer M. Predictive coding and dimension-selective attention enhance the lateralization of spoken language processing. Neurosci Biobehav Rev 2025; 172:106111. [PMID: 40118260 DOI: 10.1016/j.neubiorev.2025.106111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 02/12/2025] [Accepted: 03/15/2025] [Indexed: 03/23/2025]
Abstract
Hemispheric lateralization in speech and language processing exemplifies functional brain specialization. Seminal work in patients with left hemisphere damage highlighted the left-hemispheric dominance in language functions. However, speech processing is not confined to the left hemisphere. Hence, some researchers associate lateralization with auditory processing asymmetries: slow temporal and fine spectral acoustic information is preferentially processed in right auditory regions, while faster temporal information is primarily handled by left auditory regions. Other scholars posit that lateralization relates more to linguistic processing, particularly for speech and speech-like stimuli. We argue that these seemingly distinct accounts are interdependent. Linguistic analysis of speech relies on top-down processes, such as predictive coding and dimension-selective auditory attention, which enhance lateralized processing by engaging left-lateralized sensorimotor networks. Our review highlights that lateralization is weaker for simple sounds, stronger for speech-like sounds, and strongest for meaningful speech. Evidence shows that predictive speech processing and selective attention enhance lateralization. We illustrate that these top-down processes rely on left-lateralized sensorimotor networks and provide insights into the role of these networks in speech processing.
Collapse
Affiliation(s)
- Basil C Preisig
- The Institute for the Interdisciplinary Study of Language Evolution, Evolutionary Neuroscience of Language, University of Zurich, Switzerland; Zurich Center for Linguistics, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, Switzerland.
| | - Martin Meyer
- The Institute for the Interdisciplinary Study of Language Evolution, Evolutionary Neuroscience of Language, University of Zurich, Switzerland; Zurich Center for Linguistics, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, Switzerland
| |
Collapse
|
11
|
Bruder C, Larrouy-Maestri P. CoVox: A dataset of contrasting vocalizations. Behav Res Methods 2025; 57:142. [PMID: 40216652 PMCID: PMC11991967 DOI: 10.3758/s13428-025-02664-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2025] [Indexed: 04/14/2025]
Abstract
The human voice is remarkably versatile and can vary greatly in sound depending on how it is used. An increasing number of studies have addressed the differences and similarities between the singing and the speaking voice. However, finding adequate stimuli material that is at the same time controlled and ecologically valid is challenging, and most datasets lack variability in terms of vocal styles performed by the same voice. Here, we describe a curated stimulus set of vocalizations where 22 female singers performed the same melody excerpts in three contrasting singing styles (as a lullaby, as a pop song, and as an opera aria) and spoke the text aloud in two speaking styles (as if speaking to an adult or to an infant). All productions were made with the songs' original lyrics, in Brazilian Portuguese, and with a/lu/sound. This ecologically valid dataset of 1320 vocalizations was validated through a forced-choice lab experiment (N = 25 for each stimulus) where lay listeners could recognize the intended vocalization style with high accuracy (proportion of correct recognition superior to 69% for all styles). We also provide acoustic characterization of the stimuli, depicting clear and contrasting acoustic profiles depending on the style of vocalization. All recordings are made freely available under a Creative Commons license and can be downloaded at https://osf.io/cgexn/ .
Collapse
Affiliation(s)
- Camila Bruder
- Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt Am Main, Germany.
| | - Pauline Larrouy-Maestri
- Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt Am Main, Germany.
- Center for Language, Music, and Emotion (CLaME), New York, NY, USA.
| |
Collapse
|
12
|
Asthagiri A, Loui P. From Lab to Concert Hall: Effects of Live Performance on Neural Entrainment and Engagement. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.03.646931. [PMID: 40236171 PMCID: PMC11996556 DOI: 10.1101/2025.04.03.646931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Neural entrainment to acoustic rhythms underlies intelligibility in speech as well as sensorimotor responses to music. This property of neural dynamics, where cortical oscillations align in phase and frequency with a periodic stimulus, is well-studied in the context of sensory encoding and perception. However, little is known about how affective components in naturalistic music influence neural entrainment. The present study investigates the effect of live versus recorded music on neural entrainment and tracking using phase-based and linear modeling approaches. 21 participants listened to 2 live and 2 recorded performances of fast and slow movements of solo violin while their EEG data were collected with a mobile system. Participants made behavioral ratings of engagement, spontaneity, pleasure, investment, focus, and distraction after each trial. Live performances were rated as more engaging, pleasurable, and spontaneous than recorded performances. Live trials showed significantly higher acoustic-EEG phase-locking than recorded trials in the frequency range associated with the note-rate of the fast excerpts. Furthermore, the effect of liveness on phase-locking was strongest in individuals who reported the greatest increases in pleasure and engagement for live over recorded trials. Finally, forward linear mapping revealed stronger neural tracking of spectral over amplitude-related acoustic features and a sensitivity to tempo in neural tracking. Altogether, results suggest that experiencing music live strengthens cerebro-acoustic relationships by enhancing rhythmically-driven neural entrainment alongside perceived pleasure and engagement. Significance Statement Neural oscillations entrain to rhythms in naturalistic acoustic stimuli, including speech and music. The rhythmic structure of music impacts the timescale of neural entrainment and facilitates the pleasurable urge to move with music, but less is known about how the live experience of music affects neural entrainment. Here, we measure phase-locking and neural tracking between listeners' EEG activity and naturalistic acoustics during live and recorded solo violin performances, demonstrating that neural-acoustic interactions are driven by musical rhythms and strengthened by the perception of liveness. Together, the study provides insight into neural mechanisms underlying the pleasure of live music, suggesting that the social and affective experience of liveness alters the strength of neural entrainment.
Collapse
|
13
|
te Rietmolen N, Strijkers K, Morillon B. Moving rhythmically can facilitate naturalistic speech perception in a noisy environment. Proc Biol Sci 2025; 292:20250354. [PMID: 40199360 PMCID: PMC11978457 DOI: 10.1098/rspb.2025.0354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 03/14/2025] [Accepted: 03/14/2025] [Indexed: 04/10/2025] Open
Abstract
The motor system is known to process temporal information, and moving rhythmically while listening to a melody can improve auditory processing. In three interrelated behavioural experiments, we demonstrate that this effect translates to speech processing. Motor priming improves the efficiency of subsequent naturalistic speech-in-noise processing under specific conditions. (i) Moving rhythmically at the lexical rate (~1.8 Hz) significantly improves subsequent speech processing compared to moving at other rates, such as the phrasal or syllabic rates. (ii) The impact of such rhythmic motor priming is not influenced by whether it is self-generated or triggered by an auditory beat. (iii) Overt lexical vocalization, regardless of its semantic content, also enhances the efficiency of subsequent speech processing. These findings provide evidence for the functional role of the motor system in processing the temporal dynamics of naturalistic speech.
Collapse
Affiliation(s)
- Noémie te Rietmolen
- Institute for Language, Communication, and the Brain (ILCB), Aix-Marseille Université, Marseille, France
| | - Kristof Strijkers
- Laboratoire Parole et Langage (LPL), Aix-Marseille Université & CNRS, Aix-en-Provence, France
| | - Benjamin Morillon
- INSERM, Institut de Neurosciences des Systèmes (INS), Aix Marseille Université, Marseille, France
| |
Collapse
|
14
|
Gao Z, Oxenham AJ. Adaptation to sentences and melodies when making judgments along a voice-nonvoice continuum. Atten Percept Psychophys 2025; 87:1022-1032. [PMID: 40000570 DOI: 10.3758/s13414-025-03030-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2025] [Indexed: 02/27/2025]
Abstract
Adaptation to constant or repetitive sensory signals serves to improve detection of novel events in the environment and to encode incoming information more efficiently. Within the auditory modality, contrastive adaptation effects have been observed within a number of categories, including voice and musical instrument type. A recent study found contrastive perceptual shifts between voice and instrument categories following repetitive presentation of adaptors consisting of either vowels or instrument tones. The current study tested the generalizability of adaptation along a voice-instrument continuum, using more ecologically valid adaptors. Participants were presented with an adaptor followed by an ambiguous voice-instrument target, created by generating a 10-step morphed continuum between pairs of vowel and instrument sounds. Listeners' categorization of the target sounds was shifted contrastively by a spoken sentence or instrumental melody adaptor, regardless of whether the adaptor and the target shared the same speaker gender or similar pitch range (Experiment 1). However, no significant contrastive adaptation was observed when nonspeech vocalizations or nonpitched percussion sounds were used as the adaptors (Experiment 2). The results suggest that adaptation between voice and nonvoice categories does not rely on exact repetition of simple stimuli, nor does it solely reflect the result of a sound being categorized as being human or nonhuman sourced. The outcomes suggest future directions for determining the precise spectro-temporal properties of sounds that induce these voice-instrument contrastive adaptation effects.
Collapse
Affiliation(s)
- Zi Gao
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA
| |
Collapse
|
15
|
Keitel A, Pelofi C, Guan X, Watson E, Wight L, Allen S, Mencke I, Keitel C, Rimmele J. Cortical and behavioral tracking of rhythm in music: Effects of pitch predictability, enjoyment, and expertise. Ann N Y Acad Sci 2025; 1546:120-135. [PMID: 40101105 PMCID: PMC11998481 DOI: 10.1111/nyas.15315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The cortical tracking of stimulus features is a crucial neural requisite of how we process continuous music. We here tested whether cortical tracking of the beat, typically related to rhythm processing, is modulated by pitch predictability and other top-down factors. Participants listened to tonal (high pitch predictability) and atonal (low pitch predictability) music while undergoing electroencephalography. We analyzed their cortical tracking of the acoustic envelope. Cortical envelope tracking was stronger while listening to atonal music, potentially reflecting listeners' violated pitch expectations and increased attention allocation. Envelope tracking was also stronger with more expertise and enjoyment. Furthermore, we showed cortical tracking of pitch surprisal (using IDyOM), which suggests that listeners' expectations match those computed by the IDyOM model, with higher surprisal for atonal music. Behaviorally, we measured participants' ability to finger-tap to the beat of tonal and atonal sequences in two experiments. Finger-tapping performance was better in the tonal condition, indicating a positive effect of pitch predictability on behavioral rhythm processing. Cortical envelope tracking predicted tapping performance for tonal music, as did pitch-surprisal tracking for atonal music, indicating that high and low predictability might impose different processing regimes. Taken together, our results show various ways that top-down factors impact musical rhythm processing.
Collapse
Affiliation(s)
- Anne Keitel
- Department of PsychologyUniversity of DundeeDundeeUK
| | - Claire Pelofi
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Max Planck NYU Center for Language, Music, and EmotionNew YorkNew YorkUSA
| | - Xinyi Guan
- Max Planck NYU Center for Language, Music, and EmotionNew YorkNew YorkUSA
- Digital and Cognitive Musicology LabÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Emily Watson
- Department of PsychologyUniversity of DundeeDundeeUK
| | - Lucy Wight
- Department of PsychologyUniversity of DundeeDundeeUK
- School of PsychologyAston UniversityBirminghamUK
| | - Sarah Allen
- Department of PsychologyUniversity of DundeeDundeeUK
| | - Iris Mencke
- Department of Medical Physics and AcousticsUniversity of OldenburgOldenburgGermany
- Department of MusicMax‐Planck‐Institute for Empirical AestheticsFrankfurtGermany
| | | | - Johanna Rimmele
- Max Planck NYU Center for Language, Music, and EmotionNew YorkNew YorkUSA
- Department of Cognitive NeuropsychologyMax‐Planck‐Institute for Empirical AestheticsFrankfurtGermany
| |
Collapse
|
16
|
Fernández‐Merino L, Lizarazu M, Molinaro N, Kalashnikova M. Temporal Structure of Music Improves the Cortical Encoding of Speech. Hum Brain Mapp 2025; 46:e70199. [PMID: 40129256 PMCID: PMC11933723 DOI: 10.1002/hbm.70199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 02/03/2025] [Accepted: 03/11/2025] [Indexed: 03/26/2025] Open
Abstract
Long- and short-term musical training has been proposed to improve the efficiency of cortical tracking of speech, which refers to the synchronization of brain oscillations and the acoustic temporal structure of external stimuli. Here, we study how musical sequences with different rhythm structures can guide the temporal dynamics of auditory oscillations synchronized with the speech envelope. For this purpose, we investigated the effects of prior exposure to rhythmically structured musical sequences on cortical tracking of speech in Basque-Spanish bilingual adults (Experiment 1; N = 33, 22 female, Mean age = 25 years). We presented participants with sentences in Basque and Spanish preceded by musical sequences that differed in their rhythmical structure. The rhythmical structure of the musical sequences was created to (1) reflect and match the syllabic structure of the sentences, (2) reflect a regular rhythm but not match the syllabic structure of the sentences, and (3) follow an irregular rhythm. Participants' brain responses were recorded using electroencephalography, and speech-brain coherence in the delta and theta bands was calculated. Results showed stronger speech-brain coherence in the delta band in the first condition, but only for Spanish stimuli. A follow-up experiment including a subset of the initial sample (Experiment 2; N = 20) was conducted to investigate whether language-specific stimuli properties influenced the Basque results. Similar to Experiment 1, we found stronger speech-brain coherence in the delta and theta bands when the sentences were preceded by musical sequences that matched their syllabic structure. These results suggest that not only the regularity in music is crucial for influencing cortical tracking of speech, but so is adjusting this regularity to optimally reflect the rhythmic characteristics of listeners' native language(s). Despite finding some language-specific differences across frequencies, we showed that rhythm, inherent in musical signals, guides the adaptation of brain oscillations, by adapting the temporal dynamics of the oscillatory activity to the rhythmic scaffolding of the musical signal.
Collapse
Affiliation(s)
- Laura Fernández‐Merino
- Basque Center on Cognition, Brain and LanguageSan SebastianSpain
- University of the Basque Country (Universidad del País Vasco/Euskal Herriko Unibertsitatea)San SebastianSpain
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and LanguageSan SebastianSpain
| | - Nicola Molinaro
- Basque Center on Cognition, Brain and LanguageSan SebastianSpain
- Ikerbasque, Basque Foundation for ScienceBilbaoSpain
| | - Marina Kalashnikova
- Basque Center on Cognition, Brain and LanguageSan SebastianSpain
- Ikerbasque, Basque Foundation for ScienceBilbaoSpain
| |
Collapse
|
17
|
Borjigin A, Dennison S, Thakkar T, Kan A, Litovsky R. Best Cochlear Locations for Delivering Interaural Timing Cues in Electric Hearing. RESEARCH SQUARE 2025:rs.3.rs-5640022. [PMID: 40166036 PMCID: PMC11957186 DOI: 10.21203/rs.3.rs-5640022/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Growing numbers of children and adults who are deaf are eligible to receive cochlear implants (CI), which provide access to everyday sound. CIs in both ears (bilateral CIs or BiCIs) are becoming standard of care in many countries. However, their effectiveness is limited because they do not adequately restore the acoustic cues essential for sound localization, particularly interaural time differences (ITDs) at low frequencies. The cochlea, the auditory sensory organ, typically transmits ITDs more effectively at the apical region, which is specifically "tuned" to low frequencies. We hypothesized that effective restoration of robust ITD perception through electrical stimulation with BiCIs depends on targeting cochlear locations that transmit information most effectively. Importantly, we show that these locations can occur anywhere along the cochlea, even on the opposite end of the frequency map from where ITD cues are most dominantly encoded in an acoustic hearing system.
Collapse
|
18
|
Runfola C, Neri M, Schön D, Morillon B, Trébuchon A, Rabuffo G, Sorrentino P, Jirsa V. Complexity in speech and music listening via neural manifold flows. Netw Neurosci 2025; 9:146-158. [PMID: 40161989 PMCID: PMC11949541 DOI: 10.1162/netn_a_00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 10/21/2024] [Indexed: 04/02/2025] Open
Abstract
Understanding the complex neural mechanisms underlying speech and music perception remains a multifaceted challenge. In this study, we investigated neural dynamics using human intracranial recordings. Employing a novel approach based on low-dimensional reduction techniques, the Manifold Density Flow (MDF), we quantified the complexity of brain dynamics during naturalistic speech and music listening and during resting state. Our results reveal higher complexity in patterns of interdependence between different brain regions during speech and music listening compared with rest, suggesting that the cognitive demands of speech and music listening drive the brain dynamics toward states not observed during rest. Moreover, speech listening has more complexity than music, highlighting the nuanced differences in cognitive demands between these two auditory domains. Additionally, we validated the efficacy of the MDF method through experimentation on a toy model and compared its effectiveness in capturing the complexity of brain dynamics induced by cognitive tasks with another established technique in the literature. Overall, our findings provide a new method to quantify the complexity of brain activity by studying its temporal evolution on a low-dimensional manifold, suggesting insights that are invisible to traditional methodologies in the contexts of speech and music perception.
Collapse
Affiliation(s)
- Claudio Runfola
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Matteo Neri
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
- Aix-Marseille Université, CNRS, INT, Institut de Neurosciences de la Timone, Marseille, France
| | - Daniele Schön
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Benjamin Morillon
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Agnès Trébuchon
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Giovanni Rabuffo
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Pierpaolo Sorrentino
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Viktor Jirsa
- Aix-Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| |
Collapse
|
19
|
Chen L, Jin Y, Ge Z, Li L, Lu L. The Less Meaningful the Understanding, the Faster the Feeling: Speech Comprehension Changes Perceptual Speech Tempo. Cogn Sci 2025; 49:e70037. [PMID: 39898859 DOI: 10.1111/cogs.70037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 12/08/2024] [Accepted: 01/13/2025] [Indexed: 02/04/2025]
Abstract
The perception of speech tempo is influenced by both the acoustic properties of speech and the cognitive state of the listener. However, there is a lack of research on how speech comprehension affects the perception of speech tempo. This study aims to disentangle the impact of speech comprehension on the perception of speech tempo by manipulating linguistic structures and measuring perceptual speech tempo at explicit and implicit levels. Three experiments were conducted to explore these relationships. In Experiment 1, two explicit speech tasks revealed that listeners tend to overestimate the speech tempo of sentences with low comprehensibility, although this effect decreased with repeated exposure to the speech. Experiment 2, utilizing an implicit speech tempo task, replicated the main findings of Experiment 1. Furthermore, the results from the drift-diffusion model eliminated the possibility that participants' responses were based on the type of sentence. In Experiment 3, non-native Chinese speakers with varying levels of language proficiency completed the implicit speech rate task. The results showed that non-native Chinese speakers exhibited distinct behavioral patterns compared to native Chinese speakers, as they did not perceive differences in speech tempo between high and low comprehensibility conditions. These findings highlight the intricate relationship between the perception of speech tempo and the comprehensibility of processed speech.
Collapse
Affiliation(s)
- Liangjie Chen
- Fuzhou School of Administration, Fuzhou Provincial Party School of the Communist Party of China
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University
| | - Yangping Jin
- Center for the Cognitive Science of Language, Beijing Language and Culture University
| | - Zhongshu Ge
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University
| | - Liang Li
- School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University
| | - Lingxi Lu
- Center for the Cognitive Science of Language, Beijing Language and Culture University
| |
Collapse
|
20
|
Llanos F, Stump T, Crowhurst M. Investigating the Neural Basis of the Loud-first Principle of the Iambic-Trochaic Law. J Cogn Neurosci 2025; 37:14-27. [PMID: 39231274 DOI: 10.1162/jocn_a_02241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The perception of rhythmic patterns is crucial for the recognition of words in spoken languages, yet it remains unclear how these patterns are represented in the brain. Here, we tested the hypothesis that rhythmic patterns are encoded by neural activity phase-locked to the temporal modulation of these patterns in the speech signal. To test this hypothesis, we analyzed EEGs evoked with long sequences of alternating syllables acoustically manipulated to be perceived as a series of different rhythmic groupings in English. We found that the magnitude of the EEG at the syllable and grouping rates of each sequence was significantly higher than the noise baseline, indicating that the neural parsing of syllables and rhythmic groupings operates at different timescales. Distributional differences between the scalp topographies associated with each timescale suggests a further mechanistic dissociation between the neural segmentation of syllables and groupings. In addition, we observed that the neural tracking of louder syllables, which in trochaic languages like English are associated with the beginning of rhythmic groupings, was more robust than the neural tracking of softer syllables. The results of further bootstrapping and brain-behavior analyses indicate that the perception of rhythmic patterns is modulated by the magnitude of grouping alternations in the neural signal. These findings suggest that the temporal coding of rhythmic patterns in stress-based languages like English is supported by temporal regularities that are linguistically relevant in the speech signal.
Collapse
|
21
|
Lenc T, Lenoir C, Keller PE, Polak R, Mulders D, Nozaradan S. Measuring self-similarity in empirical signals to understand musical beat perception. Eur J Neurosci 2025; 61:e16637. [PMID: 39853878 PMCID: PMC11760665 DOI: 10.1111/ejn.16637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/15/2024] [Accepted: 11/26/2024] [Indexed: 01/26/2025]
Abstract
Experiencing music often entails the perception of a periodic beat. Despite being a widespread phenomenon across cultures, the nature and neural underpinnings of beat perception remain largely unknown. In the last decade, there has been a growing interest in developing methods to probe these processes, particularly to measure the extent to which beat-related information is contained in behavioral and neural responses. Here, we propose a theoretical framework and practical implementation of an analytic approach to capture beat-related periodicity in empirical signals using frequency-tagging. We highlight its sensitivity in measuring the extent to which the periodicity of a perceived beat is represented in a range of continuous time-varying signals with minimal assumptions. We also discuss a limitation of this approach with respect to its specificity when restricted to measuring beat-related periodicity only from the magnitude spectrum of a signal and introduce a novel extension of the approach based on autocorrelation to overcome this issue. We test the new autocorrelation-based method using simulated signals and by re-analyzing previously published data and show how it can be used to process measurements of brain activity as captured with surface EEG in adults and infants in response to rhythmic inputs. Taken together, the theoretical framework and related methodological advances confirm and elaborate the frequency-tagging approach as a promising window into the processes underlying beat perception and, more generally, temporally coordinated behaviors.
Collapse
Affiliation(s)
- Tomas Lenc
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Basque Center on Cognition, Brain and Language (BCBL)Donostia‐San SebastianSpain
| | - Cédric Lenoir
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
| | - Peter E. Keller
- MARCS Institute for Brain, Behaviour and DevelopmentWestern Sydney UniversitySydneyAustralia
- Center for Music in the Brain & Department of Clinical MedicineAarhus UniversityAarhusDenmark
| | - Rainer Polak
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and MotionUniversity of OsloOsloNorway
- Department of MusicologyUniversity of OsloOsloNorway
| | - Dounia Mulders
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Computational and Biological Learning Unit, Department of EngineeringUniversity of CambridgeCambridgeUK
- Institute for Information and Communication TechnologiesElectronics and Applied Mathematics, UCLouvainLouvain‐la‐NeuveBelgium
- Department of Brain and Cognitive Sciences and McGovern InstituteMassachusetts Institute of Technology (MIT)CambridgeMassachusettsUSA
| | - Sylvie Nozaradan
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- International Laboratory for Brain, Music and Sound Research (BRAMS)MontrealCanada
| |
Collapse
|
22
|
Marrufo-Pérez MI, Lopez-Poveda EA. Speech Recognition and Noise Adaptation in Realistic Noises. Trends Hear 2025; 29:23312165251343457. [PMID: 40370075 PMCID: PMC12081978 DOI: 10.1177/23312165251343457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 04/29/2025] [Accepted: 05/01/2025] [Indexed: 05/16/2025] Open
Abstract
The recognition of isolated words in noise improves as words are delayed from the noise onset. This phenomenon, known as adaptation to noise, has been mostly investigated using synthetic noises. The aim here was to investigate whether adaptation occurs for realistic noises and to what extent it depends on the spectrum and level fluctuations of the noise. Forty-nine different realistic and synthetic noises were analyzed and classified according to how much they fluctuated in level over time and how much their spectra differed from the speech spectrum. Six representative noises were chosen that covered the observed range of level fluctuations and spectral differences but could still mask speech. For the six noises, speech reception thresholds (SRTs) were measured for natural and tone-vocoded words delayed 50 (early condition) and 800 ms (late condition) from the noise onset. Adaptation was calculated as the SRT improvement in the late relative to the early condition. Twenty-two adults with normal hearing participated in the experiments. For natural words, adaptation was small overall (mean = 0.5 dB) and similar across the six noises. For vocoded words, significant adaptation occurred for all six noises (mean = 1.3 dB) and was not statistically different across noises. For the tested noises, the amount of adaptation was independent of the spectrum and level fluctuations of the noise. The results suggest that adaptation in speech recognition can occur in realistic noisy environments.
Collapse
Affiliation(s)
- Miriam I. Marrufo-Pérez
- Instituto de Neurociencias de Castilla y León (INCYL), Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca (IBSAL), Universidad de Salamanca, Salamanca, Spain
| | - Enrique A. Lopez-Poveda
- Instituto de Neurociencias de Castilla y León (INCYL), Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca (IBSAL), Universidad de Salamanca, Salamanca, Spain
- Departamento de Cirugía, Facultad de Medicina, Universidad de Salamanca, Salamanca, Spain
| |
Collapse
|
23
|
Lu Y, Wu Y, Zeng D, Chen C, Bian P, Xu B. Music perception and its correlation with auditory speech perception in pediatric Mandarin-speaking cochlear implant users. Acta Otolaryngol 2025; 145:51-58. [PMID: 39668767 DOI: 10.1080/00016489.2024.2437553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/21/2024] [Accepted: 11/27/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND Cochlear implants (CI) help regain perception of sound for patients with sensorineural hearing loss. The ability to recognize music pitch may be crucial for recognizing and producing speech for Mandarin. AIMS/OBJECTIVES This study aims to search for possible influencing factors of music perception and correlations between music perception and auditory speech abilities among prelingually deaf pediatric Mandarin-speaking CI users. MATERIAL AND METHODS Music perception of 24 pediatric CI users and 12 normal hearing children was measured using the MuSIC test. Auditory speech perception of the 24 CI users was also measured and analyzed with their music perception results. RESULTS Pediatric CI users performed worse than normal hearing children in pitch, rhythm and melody discrimination tests (p < .05). Significant difference in pitch and melody discrimination tests between age at implantation <5 and >5 groups was found. There were significant correlations between perception of consonants, tones, and speech in a noisy environment and perception of music pitch and melody. CONCLUSION AND SIGNIFICANCE Prelingually deaf pediatric CI users who received implantation before the age of five perform better in music perception tests. Pediatric CI users with better music perception show better auditory speech perception of Mandarin.
Collapse
Affiliation(s)
- Yunyi Lu
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| | - Yutong Wu
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| | - Dong Zeng
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| | - Chi Chen
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| | - Panpan Bian
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| | - Baicheng Xu
- Department of Otorhinolaryngology, Lanzhou University Second Hospital, Lanzhou, China
| |
Collapse
|
24
|
Borjigin A, Dennison SR, Thakkar T, Kan A, Litovsky RY. Best Cochlear Locations for Delivering Interaural Timing Cues in Electric Hearing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.27.627652. [PMID: 39763970 PMCID: PMC11703218 DOI: 10.1101/2024.12.27.627652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Growing numbers of children and adults who are deaf are eligible to receive cochlear implants (CI), which provide access to everyday sound. CIs in both ears (bilateral CIs or BiCIs) are becoming standard of care in many countries. However, their effectiveness is limited because they do not adequately restore the acoustic cues essential for sound localization, particularly interaural time differences (ITDs) at low frequencies. The cochlea, the auditory sensory organ, typically transmits ITDs more effectively at the apical region, which is specifically "tuned" to low frequencies. We hypothesized that effective restoration of robust ITD perception through electrical stimulation with BiCIs depends on targeting cochlear locations that transmit information most effectively. Importantly, we show that these locations can occur anywhere along the cochlea, even on the opposite end of the frequency map from where ITD cues are most dominantly encoded in an acoustic hearing system.
Collapse
Affiliation(s)
| | | | - Tanvi Thakkar
- University of Wisconsin-La Crosse (La Crosse, WI, USA)
| | - Alan Kan
- Macquarie University (Sydney, NSW, Australia)
| | | |
Collapse
|
25
|
Barbaresi M, Nardo D, Fagioli S. Physiological Entrainment: A Key Mind-Body Mechanism for Cognitive, Motor and Affective Functioning, and Well-Being. Brain Sci 2024; 15:3. [PMID: 39851371 PMCID: PMC11763407 DOI: 10.3390/brainsci15010003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 12/13/2024] [Accepted: 12/21/2024] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND The human sensorimotor system can naturally synchronize with environmental rhythms, such as light pulses or sound beats. Several studies showed that different styles and tempos of music, or other rhythmic stimuli, have an impact on physiological rhythms, including electrocortical brain activity, heart rate, and motor coordination. Such synchronization, also known as the "entrainment effect", has been identified as a crucial mechanism impacting cognitive, motor, and affective functioning. OBJECTIVES This review examines theoretical and empirical contributions to the literature on entrainment, with a particular focus on the physiological mechanisms underlying this phenomenon and its role in cognitive, motor, and affective functions. We also address the inconsistent terminology used in the literature and evaluate the range of measurement approaches used to assess entrainment phenomena. Finally, we propose a definition of "physiological entrainment" that emphasizes its role as a fundamental mechanism that encompasses rhythmic interactions between the body and its environment, to support information processing across bodily systems and to sustain adaptive motor responses. METHODS We reviewed the recent literature through the lens of the "embodied cognition" framework, offering a unified perspective on the phenomenon of physiological entrainment. RESULTS Evidence from the current literature suggests that physiological entrainment produces measurable effects, especially on neural oscillations, heart rate variability, and motor synchronization. Eventually, such physiological changes can impact cognitive processing, affective functioning, and motor coordination. CONCLUSIONS Physiological entrainment emerges as a fundamental mechanism underlying the mind-body connection. Entrainment-based interventions may be used to promote well-being by enhancing cognitive, motor, and affective functions, suggesting potential rehabilitative approaches to enhancing mental health.
Collapse
Affiliation(s)
| | - Davide Nardo
- Department of Education, “Roma Tre” University, 00185 Rome, Italy; (M.B.); (S.F.)
| | | |
Collapse
|
26
|
Giroud J, Trébuchon A, Mercier M, Davis MH, Morillon B. The human auditory cortex concurrently tracks syllabic and phonemic timescales via acoustic spectral flux. SCIENCE ADVANCES 2024; 10:eado8915. [PMID: 39705351 DOI: 10.1126/sciadv.ado8915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 11/15/2024] [Indexed: 12/22/2024]
Abstract
Dynamical theories of speech processing propose that the auditory cortex parses acoustic information in parallel at the syllabic and phonemic timescales. We developed a paradigm to independently manipulate both linguistic timescales, and acquired intracranial recordings from 11 patients who are epileptic listening to French sentences. Our results indicate that (i) syllabic and phonemic timescales are both reflected in the acoustic spectral flux; (ii) during comprehension, the auditory cortex tracks the syllabic timescale in the theta range, while neural activity in the alpha-beta range phase locks to the phonemic timescale; (iii) these neural dynamics occur simultaneously and share a joint spatial location; (iv) the spectral flux embeds two timescales-in the theta and low-beta ranges-across 17 natural languages. These findings help us understand how the human brain extracts acoustic information from the continuous speech signal at multiple timescales simultaneously, a prerequisite for subsequent linguistic processing.
Collapse
Affiliation(s)
- Jérémy Giroud
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Agnès Trébuchon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
- APHM, Clinical Neurophysiology, Timone Hospital, Marseille, France
| | - Manuel Mercier
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| |
Collapse
|
27
|
Peterson RE, Choudhri A, Mitelut C, Tanelus A, Capo-Battaglia A, Williams AH, Schneider DM, Sanes DH. Unsupervised discovery of family specific vocal usage in the Mongolian gerbil. eLife 2024; 12:RP89892. [PMID: 39680425 PMCID: PMC11649239 DOI: 10.7554/elife.89892] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024] Open
Abstract
In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.
Collapse
Affiliation(s)
- Ralph E Peterson
- Center for Neural Science, New York UniversityNew YorkUnited States
- Center for Computational Neuroscience, Flatiron InstituteNew YorkUnited States
| | | | - Catalin Mitelut
- Center for Neural Science, New York UniversityNew YorkUnited States
| | - Aramis Tanelus
- Center for Neural Science, New York UniversityNew YorkUnited States
- Center for Computational Neuroscience, Flatiron InstituteNew YorkUnited States
| | | | - Alex H Williams
- Center for Neural Science, New York UniversityNew YorkUnited States
- Center for Computational Neuroscience, Flatiron InstituteNew YorkUnited States
| | | | - Dan H Sanes
- Center for Neural Science, New York UniversityNew YorkUnited States
- Department of Psychology, New York UniversityNew YorkUnited States
- Neuroscience Institute, New York University School of MedicineNew YorkUnited States
- Department of Biology, New York UniversityNew YorkUnited States
| |
Collapse
|
28
|
Karunathilake ID, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are enhanced by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and a narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers, linguistic level late responses, derived from TRF components modulated by linguistic content, suggesting that these markers are indicative of speech comprehension rather than mere speech perception.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics, and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742
| | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, 20742
| |
Collapse
|
29
|
Nishiyama R, Nonaka T. Can a human sing with an unseen artificial partner? Coordination dynamics when singing with an unseen human or artificial partner. Front Robot AI 2024; 11:1463477. [PMID: 39717549 PMCID: PMC11663750 DOI: 10.3389/frobt.2024.1463477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 11/20/2024] [Indexed: 12/25/2024] Open
Abstract
This study investigated whether a singer's coordination patterns differ when singing with an unseen human partner versus an unseen artificial partner (VOCALOID 6 voice synthesis software). We used cross-correlation analysis to compare the correlation of the amplitude envelope time series between the partner's and the participant's singing voices. We also conducted a Granger causality test to determine whether the past amplitude envelope of the partner helps predict the future amplitude envelope of the participants, or if the reverse is true. We found more pronounced characteristics of anticipatory synchronization and increased similarity in the unfolding dynamics of the amplitude envelopes in the human-partner condition compared to the artificial-partner condition, despite the tempo fluctuations in the human-partner condition. The results suggested that subtle qualities of the human singing voice, possibly stemming from intrinsic dynamics of the human body, may contain information that enables human agents to align their singing behavior dynamics with a human partner.
Collapse
Affiliation(s)
| | - Tetsushi Nonaka
- Graduate School of Human Development and Environment, Kobe University, Kobe, Japan
| |
Collapse
|
30
|
Bolt E, Kliestenec K, Giroud N. Hearing and cognitive decline in aging differentially impact neural tracking of context-supported versus random speech across linguistic timescales. PLoS One 2024; 19:e0313854. [PMID: 39642146 PMCID: PMC11623803 DOI: 10.1371/journal.pone.0313854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 10/31/2024] [Indexed: 12/08/2024] Open
Abstract
Cognitive decline and hearing loss are common in older adults and often co-occur while investigated separately, affecting the neural processing of speech. This study investigated the interaction between cognitive decline, hearing loss, and contextual cues in speech processing. Participants aged 60 years and older were assessed for cognitive decline using the Montreal Cognitive Assessment and for hearing ability using a four-frequency pure tone average. They listened to in-house-designed matrix-style sentences that either provided supportive context or were random, while we recorded their electroencephalography. Neurophysiological responses were analyzed through auditory evoked potentials and speech tracking at different linguistic timescales (i.e., phrase, word, syllable and phoneme rate) using phase-locking values. The results showed that cognitive decline was associated with decreased response accuracy in a speech recognition task. Cognitive decline significantly impacted the P2 component of auditory evoked potentials, while hearing loss influenced speech tracking at the word and phoneme rates, but not at the phrase or syllable rates. Contextual cues enhanced speech tracking at the syllable rate. These findings suggest that cognitive decline and hearing loss differentially affect the neural mechanisms underlying speech processing, with contextual cues playing a significant role in enhancing syllable rate tracking. This study emphasises the importance of considering both cognitive and auditory factors when studying speech processing in older people and highlights the need for further research to investigate the interplay between cognitive decline, hearing loss and contextual cues in speech processing.
Collapse
Affiliation(s)
- Elena Bolt
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School on the Life Course (IMPRS LIFE), University of Zurich, Zurich, Switzerland
| | - Katarina Kliestenec
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Nathalie Giroud
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School on the Life Course (IMPRS LIFE), University of Zurich, Zurich, Switzerland
- Language & Medicine Centre Zurich, Competence Centre of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
| |
Collapse
|
31
|
Oderbolz C, Stark E, Sauppe S, Meyer M. Concurrent processing of the prosodic hierarchy is supported by cortical entrainment and phase-amplitude coupling. Cereb Cortex 2024; 34:bhae479. [PMID: 39704246 DOI: 10.1093/cercor/bhae479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/30/2024] [Accepted: 11/28/2024] [Indexed: 12/21/2024] Open
Abstract
Models of phonology posit a hierarchy of prosodic units that is relatively independent from syntactic structure, requiring its own parsing. It remains unexplored how this prosodic hierarchy is represented in the brain. We investigated this foundational question by means of an electroencephalography (EEG) study. Thirty young adults listened to German sentences containing manipulations at different levels of the prosodic hierarchy. Evaluating speech-to-brain cortical entrainment and phase-amplitude coupling revealed that prosody's hierarchical structure is maintained at the neural level during spoken language comprehension. The faithfulness of this tracking varied as a function of the hierarchy's degree of intactness as well as systematic interindividual differences in audio-motor synchronization abilities. The results underscore the role of complex oscillatory mechanisms in configuring the continuous and hierarchical nature of the speech signal and situate prosody as a structure indispensable from theoretical perspectives on spoken language comprehension in the brain.
Collapse
Affiliation(s)
- Chantal Oderbolz
- Institute for the Interdisciplinary Study of Language Evolution, University of Zurich, Affolternstrasse 56, 8050 Zürich, Switzerland
- Department of Neuroscience, Georgetown University Medical Center, 3970 Reservoir Rd NW, Washington D.C. 20057, United States
| | - Elisabeth Stark
- Zurich Center for Linguistics, University of Zurich, Andreasstrasse 15, 8050 Zürich, Switzerland
- Institute of Romance Studies, University of Zurich, Zürichbergstrasse 8, 8032 Zürich, Switzerland
| | - Sebastian Sauppe
- Department of Psychology, University of Zurich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
| | - Martin Meyer
- Institute for the Interdisciplinary Study of Language Evolution, University of Zurich, Affolternstrasse 56, 8050 Zürich, Switzerland
| |
Collapse
|
32
|
Kaland C, Swerts M. The Attractiveness of Average Speech Rhythms: Revisiting the Average Effect From a Crosslinguistic Perspective. LANGUAGE AND SPEECH 2024; 67:1054-1074. [PMID: 38156473 DOI: 10.1177/00238309231217689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
The current study investigates the average effect: the tendency for humans to appreciate an averaged (face, bird, wristwatch, car, and so on) over an individual instance. The effect holds across cultures, despite varying conceptualizations of attractiveness. While much research has been conducted on the average effect in visual perception, much less is known about the extent to which this effect applies to language and speech. This study investigates the attractiveness of average speech rhythms in Dutch and Mandarin Chinese, two typologically different languages. This was tested in a series of perception experiments in either language in which native listeners chose the most attractive one from a pair of acoustically manipulated rhythms. For each language, two experiments were carried out to control for the potential influence of the acoustic manipulation on the average effect. The results confirm the average effect in both languages, and they do not exclude individual variation in the listeners' perception of attractiveness. The outcomes provide a new crosslinguistic perspective and give rise to alternative explanations to the average effect.
Collapse
Affiliation(s)
| | - Marc Swerts
- Department of Communication and Cognition, Tilburg University, The Netherlands
| |
Collapse
|
33
|
Naeije G, Niesen M, Vander Ghinst M, Bourguignon M. Simultaneous EEG recording of cortical tracking of speech and movement kinematics. Neuroscience 2024; 561:1-10. [PMID: 39395635 DOI: 10.1016/j.neuroscience.2024.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/23/2024] [Accepted: 10/06/2024] [Indexed: 10/14/2024]
Abstract
RATIONALE Cortical activity is coupled with streams of sensory stimulation. The coupling with the temporal envelope of heard speech is known as the cortical tracking of speech (CTS), and that with movement kinematics is known as the corticokinematic coupling (CKC). Simultaneous measurement of both couplings is desirable in clinical settings, but it is unknown whether the inherent dual-tasking condition has an impact on CTS or CKC. AIM We aim to determine whether and how CTS and CKC levels are affected when recorded simultaneously. METHODS Twenty-three healthy young adults underwent 64-channel EEG recordings while listening to stories and while performing repetitive finger-tapping movements in 3 conditions: separately (audio- or tapping-only) or simultaneously (audio-tapping). CTS and CKC values were estimated using coherence analysis between each EEG signal and speech temporal envelope (CTS) or finger acceleration (CKC). CTS was also estimated as the reconstruction accuracy of a decoding model. RESULTS Across recordings, CTS assessed with reconstruction accuracy was significant in 85 % of the subjects at phrasal frequency (0.5 Hz) and in 68 % at syllabic frequencies (4-8 Hz), and CKC was significant in over 85 % of the subjects at movement frequency and its first harmonic. Comparing CTS and CKC values evaluated in separate recordings to those in simultaneous recordings revealed no significant difference and moderate-to-high levels of correlation. CONCLUSION Despite the subtle behavioral effects, CTS and CKC are not evidently altered by the dual-task setting inherent to recording them simultaneously and can be evaluated simultaneously using EEG in clinical settings.
Collapse
Affiliation(s)
- Gilles Naeije
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, UNI - ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium; Centre de Référence Neuromusculaire, Department of Neurology, HUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium.
| | - Maxime Niesen
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, UNI - ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium; Service d'ORL et de chirurgie cervico-faciale, HUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Marc Vander Ghinst
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, UNI - ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium; Service d'ORL et de chirurgie cervico-faciale, HUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Mathieu Bourguignon
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, UNI - ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium; Laboratory of Neurophysiology and Movement Biomechanics, UNI - ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| |
Collapse
|
34
|
Gastaldon S, Busan P, Molinaro N, Lizarazu M. Cortical Tracking of Speech Is Reduced in Adults Who Stutter When Listening for Speaking. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:4339-4357. [PMID: 39437265 DOI: 10.1044/2024_jslhr-24-00227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
PURPOSE The purpose of this study was to investigate cortical tracking of speech (CTS) in adults who stutter (AWS) compared to typically fluent adults (TFAs) to test the involvement of the speech-motor network in tracking rhythmic speech information. METHOD Participants' electroencephalogram was recorded while they simply listened to sentences (listening only) or completed them by naming a picture (listening for speaking), thus manipulating the upcoming involvement of speech production. We analyzed speech-brain coherence and brain connectivity during listening. RESULTS During the listening-for-speaking task, AWS exhibited reduced CTS in the 3- to 5-Hz range (theta), corresponding to the syllabic rhythm. The effect was localized in the left inferior parietal and right pre/supplementary motor regions. Connectivity analyses revealed that TFAs had stronger information transfer in the theta range in both tasks in fronto-temporo-parietal regions. When considering the whole sample of participants, increased connectivity from the right superior temporal cortex to the left sensorimotor cortex was correlated with faster naming times in the listening-for-speaking task. CONCLUSIONS Atypical speech-motor functioning in stuttering impacts speech perception, especially in situations requiring articulatory alertness. The involvement of frontal and (pre)motor regions in CTS in TFAs is highlighted. Further investigation is needed into speech perception in individuals with speech-motor deficits, especially when smooth transitioning between listening and speaking is required, such as in real-life conversational settings. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.27234885.
Collapse
Affiliation(s)
- Simone Gastaldon
- Department of Developmental and Social Psychology, University of Padua, Italy
- Padova Neuroscience Center, University of Padua, Italy
| | - Pierpaolo Busan
- Department of Medical, Surgical and Health Sciences, University of Trieste, Italy
| | - Nicola Molinaro
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| |
Collapse
|
35
|
Grent-'t-Jong T, Dheerendra P, Fusar-Poli P, Gross J, Gumley AI, Krishnadas R, Muckli LF, Uhlhaas PJ. Entrainment of neural oscillations during language processing in Early-Stage schizophrenia. Neuroimage Clin 2024; 44:103695. [PMID: 39536523 PMCID: PMC11602575 DOI: 10.1016/j.nicl.2024.103695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 09/25/2024] [Accepted: 10/25/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND Impairments in language processing in schizophrenia (ScZ) are a central aspect of the disorder but the underlying pathophysiology mechanisms are unclear. In the current study, we tested the hypothesis that neural oscillations are impaired during speech tracking in early-stage ScZ and in participants at clinical high-risk for psychosis (CHR-P). METHOD Magnetoencephalography (MEG) was used in combination with source reconstructed time-series to examine delta and theta-band entrainment during continuous speech. Participants were presented with a 5-minute audio recording during which they either attened to the story or word level. MEG-data were obtained from n = 22 CHR-P participants, n = 23 early-stage ScZ-patients, and n = 44 healthy controls (HC). Data were analysed with a Mutual Information (MI) approach to compute statistical dependence between the MEG and auditory signal, thus estimating individual speech-tracking ability. MEG-activity was reconstructed in a language network (bilateral inferior frontal cortex [F3T; Broca's], superior temporal areas [STS3, STS4; Wernicke's areas], and primary auditory cortex [bilateral HES; Heschl's gyrus]). MEG-data were correlated with clinical symptoms. RESULTS Theta-band entrainment in left Heschl's gyrus, averaged across groups, was significantly lower in the STORY compared to WORD condition (p = 0.022), and averaged over conditions, significantly lower in CHR-Ps (p = 0.045), but intact in early ScZ patients (p = 0.303), compared to controls. Correlation analyses between MEG data and symptom indicated that lower theta-band tracking in CHR-Ps was linked to the severity of perceptual abnormalities (p = 0.018). CONCLUSION Our results show that CHR-P participants involve impairments in theta-band entrainment during speech tracking in left primary auditory cortex while higher-order speech processing areas were intact. Moreover, the severity of aberrant perceptual experiences in CHR-P participants correlated with deficits in theta-band entrainment. Together, these findings highlight the possibility that neural oscillations during language processing could reveal fundamental abnormalities in speech processing which may constitute candidate biomarkers for early detection and diagnosis of ScZ.
Collapse
Affiliation(s)
- Tineke Grent-'t-Jong
- Department of Child and Adolescent Psychiatry, Charité Universitätsmedizin, Berlin, Germany
| | | | - Paolo Fusar-Poli
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; Early Psychosis: Interventions and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, King's College London, UK; Department of Brain and Behavioral Sciences, University of Pavia, Italy; Outreach and Support in South-London (OASIS) service, South London and Maudlsey (SLaM) NHS Foundation Trust, UK; Department of Psychiatry and Psychotherapy, University Hospital, Ludwig-Maximilian-University (LMU), Munich, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignalanalysis, University of Muenster, Muenster, Germany
| | | | | | - Lars F Muckli
- School of Psychology and Neuroscience, University of Glasgow, UK
| | - Peter J Uhlhaas
- Department of Child and Adolescent Psychiatry, Charité Universitätsmedizin, Berlin, Germany; School of Psychology and Neuroscience, University of Glasgow, UK.
| |
Collapse
|
36
|
Clonan AC, Zhai X, Stevenson IH, Escabí MA. Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.579526. [PMID: 38405870 PMCID: PMC10888804 DOI: 10.1101/2024.02.13.579526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving 1-5 and coding4,6-9 natural sounds, it's less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. We demonstrate that human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectro-temporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0% to 100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed generalized perceptual regression, a framework that links summary statistics from a neural model to word recognition accuracy. Whereas a peripheral cochlear model accounts for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predicts single trial sensory judgments, accounting for more than 90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds.
Collapse
Affiliation(s)
- Alex C Clonan
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Xiu Zhai
- Biomedical Engineering, Wentworth Institute of Technology, Boston, MA 02115
| | - Ian H Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Monty A Escabí
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| |
Collapse
|
37
|
Townsend PH, Jones A, Patel AD, Race E. Rhythmic Temporal Cues Coordinate Cross-frequency Phase-amplitude Coupling during Memory Encoding. J Cogn Neurosci 2024; 36:2100-2116. [PMID: 38991125 DOI: 10.1162/jocn_a_02217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Accumulating evidence suggests that rhythmic temporal cues in the environment influence the encoding of information into long-term memory. Here, we test the hypothesis that these mnemonic effects of rhythm reflect the coupling of high-frequency (gamma) oscillations to entrained lower-frequency oscillations synchronized to the beat of the rhythm. In Study 1, we first test this hypothesis in the context of global effects of rhythm on memory, when memory is superior for visual stimuli presented in rhythmic compared with arrhythmic patterns at encoding [Jones, A., & Ward, E. V. Rhythmic temporal structure at encoding enhances recognition memory, Journal of Cognitive Neuroscience, 31, 1549-1562, 2019]. We found that rhythmic presentation of visual stimuli during encoding was associated with greater phase-amplitude coupling (PAC) between entrained low-frequency (delta) oscillations and higher-frequency (gamma) oscillations. In Study 2, we next investigated cross-frequency PAC in the context of local effects of rhythm on memory encoding, when memory is superior for visual stimuli presented in-synchrony compared with out-of-synchrony with a background auditory beat [Hickey, P., Merseal, H., Patel, A. D., & Race, E. Memory in time: Neural tracking of low-frequency rhythm dynamically modulates memory formation. Neuroimage, 213, 116693, 2020]. We found that the mnemonic effect of rhythm in this context was again associated with increased cross-frequency PAC between entrained low-frequency (delta) oscillations and higher-frequency (gamma) oscillations. Furthermore, the magnitude of gamma power modulations positively scaled with the subsequent memory benefit for in- versus out-of-synchrony stimuli. Together, these results suggest that the influence of rhythm on memory encoding may reflect the temporal coordination of higher-frequency gamma activity by entrained low-frequency oscillations.
Collapse
Affiliation(s)
- Paige Hickey Townsend
- Massachusetts General Hospital, Charlestown, MA
- Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, MA
| | | | - Aniruddh D Patel
- Tufts University, Medford, MA
- Canadian Institute for Advanced Research
| | | |
Collapse
|
38
|
Zhang M, Riecke L, Bonte M. Cortical tracking of language structures: Modality-dependent and independent responses. Clin Neurophysiol 2024; 166:56-65. [PMID: 39111244 DOI: 10.1016/j.clinph.2024.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 04/18/2024] [Accepted: 07/20/2024] [Indexed: 09/15/2024]
Abstract
OBJECTIVES The mental parsing of linguistic hierarchy is crucial for language comprehension, and while there is growing interest in the cortical tracking of auditory speech, the neurophysiological substrates for tracking written language are still unclear. METHODS We recorded electroencephalographic (EEG) responses from participants exposed to auditory and visual streams of either random syllables or tri-syllabic real words. Using a frequency-tagging approach, we analyzed the neural representations of physically presented (i.e., syllables) and mentally constructed (i.e., words) linguistic units and compared them between the two sensory modalities. RESULTS We found that tracking syllables is partially modality dependent, with anterior and posterior scalp regions more involved in the tracking of spoken and written syllables, respectively. The cortical tracking of spoken and written words instead was found to involve a shared anterior region to a similar degree, suggesting a modality-independent process for word tracking. CONCLUSION Our study suggests that basic linguistic features are represented in a sensory modality-specific manner, while more abstract ones are modality-unspecific during the online processing of continuous language input. SIGNIFICANCE The current methodology may be utilized in future research to examine the development of reading skills, especially the deficiencies in fluent reading among those with dyslexia.
Collapse
Affiliation(s)
- Manli Zhang
- Maastricht Brain Imaging Center, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands.
| | - Lars Riecke
- Maastricht Brain Imaging Center, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Milene Bonte
- Maastricht Brain Imaging Center, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
39
|
Kasten FH, Busson Q, Zoefel B. Opposing neural processing modes alternate rhythmically during sustained auditory attention. Commun Biol 2024; 7:1125. [PMID: 39266696 PMCID: PMC11393317 DOI: 10.1038/s42003-024-06834-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 09/03/2024] [Indexed: 09/14/2024] Open
Abstract
During continuous tasks, humans show spontaneous fluctuations in performance, putatively caused by varying attentional resources allocated to process external information. If neural resources are used to process other, presumably "internal" information, sensory input can be missed and explain an apparent dichotomy of "internal" versus "external" attention. In the current study, we extract presumed neural signatures of these attentional modes in human electroencephalography (EEG): neural entrainment and α-oscillations (~10-Hz), linked to the processing and suppression of sensory information, respectively. We test whether they exhibit structured fluctuations over time, while listeners attend to an ecologically relevant stimulus, like speech, and complete a task that requires full and continuous attention. Results show an antagonistic relation between neural entrainment to speech and spontaneous α-oscillations in two distinct brain networks-one specialized in the processing of external information, the other reminiscent of the dorsal attention network. These opposing neural modes undergo slow, periodic fluctuations around ~0.07 Hz and are related to the detection of auditory targets. Our study might have tapped into a general attentional mechanism that is conserved across species and has important implications for situations in which sustained attention to sensory information is critical.
Collapse
Affiliation(s)
- Florian H Kasten
- Department for Cognitive, Affective, Behavioral Neuroscience with Focus Neurostimulation, Institute of Psychology, University of Trier, Trier, Germany.
- Centre de Recherche Cerveau & Cognition, CNRS, Toulouse, France.
- Université Toulouse III Paul Sabatier, Toulouse, France.
| | | | - Benedikt Zoefel
- Centre de Recherche Cerveau & Cognition, CNRS, Toulouse, France.
- Université Toulouse III Paul Sabatier, Toulouse, France.
| |
Collapse
|
40
|
Peterson RE, Choudhri A, Mitelut C, Tanelus A, Capo-Battaglia A, Williams AH, Schneider DM, Sanes DH. Unsupervised discovery of family specific vocal usage in the Mongolian gerbil. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.11.532197. [PMID: 39282260 PMCID: PMC11398318 DOI: 10.1101/2023.03.11.532197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/21/2024]
Abstract
In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.
Collapse
Affiliation(s)
- Ralph E. Peterson
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Catalin Mitelut
- Center for Neural Science, New York University, New York, NY
| | - Aramis Tanelus
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Alex H. Williams
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Dan H. Sanes
- Center for Neural Science, New York University, New York, NY
- Department of Psychology, New York University, New York, NY
- Department of Biology, New York University, New York, NY
- Neuroscience Institute, New York University School of Medicine, New York, NY
| |
Collapse
|
41
|
He D, Buder EH, Bidelman GM. Cross-linguistic and acoustic-driven effects on multiscale neural synchrony to stress rhythms. BRAIN AND LANGUAGE 2024; 256:105463. [PMID: 39243486 PMCID: PMC11422791 DOI: 10.1016/j.bandl.2024.105463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 09/01/2024] [Accepted: 09/03/2024] [Indexed: 09/09/2024]
Abstract
We investigated how neural oscillations code the hierarchical nature of stress rhythms in speech and how stress processing varies with language experience. By measuring phase synchrony of multilevel EEG-acoustic tracking and intra-brain cross-frequency coupling, we show the encoding of stress involves different neural signatures (delta rhythms = stress foot rate; theta rhythms = syllable rate), is stronger for amplitude vs. duration stress cues, and induces nested delta-theta coherence mirroring the stress-syllable hierarchy in speech. Only native English, but not Mandarin, speakers exhibited enhanced neural entrainment at central stress (2 Hz) and syllable (4 Hz) rates intrinsic to natural English. English individuals with superior cortical-stress tracking capabilities also displayed stronger neural hierarchical coherence, highlighting a nuanced interplay between internal nesting of brain rhythms and external entrainment rooted in language-specific speech rhythms. Our cross-language findings reveal brain-speech synchronization is not purely a "bottom-up" but benefits from "top-down" processing from listeners' language-specific experience.
Collapse
Affiliation(s)
- Deling He
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Eugene H Buder
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
42
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
43
|
Zhu M, Chen F, Shi C, Zhang Y. Amplitude envelope onset characteristics modulate phase locking for speech auditory-motor synchronization. Psychon Bull Rev 2024; 31:1661-1669. [PMID: 38227125 DOI: 10.3758/s13423-023-02446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2023] [Indexed: 01/17/2024]
Abstract
The spontaneous speech-to-speech synchronization (SSS) test has been shown to be an effective behavioral method to estimate cortical speech auditory-motor coupling strength through phase-locking value (PLV) between auditory input and motor output. This study further investigated how amplitude envelope onset variations of the auditory speech signal may influence the speech auditory-motor synchronization. Sixty Mandarin-speaking adults listened to a stream of randomly presented syllables at an increasing speed while concurrently whispering in synchrony with the rhythm of the auditory stimuli whose onset consistency was manipulated, consisting of aspirated, unaspirated, and mixed conditions. The participants' PLVs for the three conditions in the SSS test were derived and compared. Results showed that syllable rise time affected the speech auditory-motor synchronization in a bifurcated fashion. Specifically, PLVs were significantly higher in the temporally more consistent conditions (aspirated or unaspirated) than those in the less consistent condition (mixed) for high synchronizers. In contrast, low synchronizers tended to be immune to the onset consistency. Overall, these results validated how syllable onset consistency in the rise time of amplitude envelope may modulate the strength of speech auditory-motor coupling. This study supports the application of the SSS test to examine individual differences in the integration of perception and production systems, which has implications for those with speech and language disorders that have difficulty with processing speech onset characteristics such as rise time.
Collapse
Affiliation(s)
- Min Zhu
- School of Foreign Languages, Hunan University, Changsha, China
| | - Fei Chen
- School of Foreign Languages, Hunan University, Changsha, China.
| | - Chenxin Shi
- School of Foreign Languages, Hunan University, Changsha, China
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Masonic Institute for the Developing Brain, The University of Minnesota, Twin Cities, MN, USA.
| |
Collapse
|
44
|
Piña Méndez Á, Taitz A, Palacios Rodríguez O, Rodríguez Leyva I, Assaneo MF. Speech's syllabic rhythm and articulatory features produced under different auditory feedback conditions identify Parkinsonism. Sci Rep 2024; 14:15787. [PMID: 38982177 PMCID: PMC11233651 DOI: 10.1038/s41598-024-65974-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/25/2024] [Indexed: 07/11/2024] Open
Abstract
Diagnostic tests for Parkinsonism based on speech samples have shown promising results. Although abnormal auditory feedback integration during speech production and impaired rhythmic organization of speech are known in Parkinsonism, these aspects have not been incorporated into diagnostic tests. This study aimed to identify Parkinsonism using a novel speech behavioral test that involved rhythmically repeating syllables under different auditory feedback conditions. The study included 30 individuals with Parkinson's disease (PD) and 30 healthy subjects. Participants were asked to rhythmically repeat the PA-TA-KA syllable sequence, both whispering and speaking aloud under various listening conditions. The results showed that individuals with PD had difficulties in whispering and articulating under altered auditory feedback conditions, exhibited delayed speech onset, and demonstrated inconsistent rhythmic structure across trials compared to controls. These parameters were then fed into a supervised machine-learning algorithm to differentiate between the two groups. The algorithm achieved an accuracy of 85.4%, a sensitivity of 86.5%, and a specificity of 84.3%. This pilot study highlights the potential of the proposed behavioral paradigm as an objective and accessible (both in cost and time) test for identifying individuals with Parkinson's disease.
Collapse
Affiliation(s)
- Ángeles Piña Méndez
- Faculty of Psychology, Autonomous University of San Luis Potosí, San Luis Potosí, Mexico
| | | | | | | | - M Florencia Assaneo
- Institute of Neurobiology, National Autonomous University of Mexico, Querétaro, Mexico.
| |
Collapse
|
45
|
Ortiz-Barajas MC. Predicting language outcome at birth. Front Hum Neurosci 2024; 18:1370572. [PMID: 39036813 PMCID: PMC11258996 DOI: 10.3389/fnhum.2024.1370572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 06/11/2024] [Indexed: 07/23/2024] Open
Abstract
Even though most children acquire language effortlessly, not all do. Nowadays, language disorders are difficult to diagnose before 3-4 years of age, because diagnosis relies on behavioral criteria difficult to obtain early in life. Using electroencephalography, I investigated whether differences in newborns' neural activity when listening to sentences in their native language (French) and a rhythmically different unfamiliar language (English) relate to measures of later language development at 12 and 18 months. Here I show that activation differences in the theta band at birth predict language comprehension abilities at 12 and 18 months. These findings suggest that a neural measure of language discrimination at birth could be used in the early identification of infants at risk of developmental language disorders.
Collapse
|
46
|
Fletcher MD, Akis E, Verschuur CA, Perry SW. Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion. Sci Rep 2024; 14:15029. [PMID: 38951556 PMCID: PMC11217272 DOI: 10.1038/s41598-024-65510-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 06/20/2024] [Indexed: 07/03/2024] Open
Abstract
Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.
Collapse
Affiliation(s)
- Mark D Fletcher
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Esma Akis
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Carl A Verschuur
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Samuel W Perry
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| |
Collapse
|
47
|
Berthault E, Chen S, Falk S, Morillon B, Schön D. Auditory and motor priming of metric structure improves understanding of degraded speech. Cognition 2024; 248:105793. [PMID: 38636164 DOI: 10.1016/j.cognition.2024.105793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024]
Abstract
Speech comprehension is enhanced when preceded (or accompanied) by a congruent rhythmic prime reflecting the metrical sentence structure. Although these phenomena have been described for auditory and motor primes separately, their respective and synergistic contribution has not been addressed. In this experiment, participants performed a speech comprehension task on degraded speech signals that were preceded by a rhythmic prime that could be auditory, motor or audiomotor. Both auditory and audiomotor rhythmic primes facilitated speech comprehension speed. While the presence of a purely motor prime (unpaced tapping) did not globally benefit speech comprehension, comprehension accuracy scaled with the regularity of motor tapping. In order to investigate inter-individual variability, participants also performed a Spontaneous Speech Synchronization test. The strength of the estimated perception-production coupling correlated positively with overall speech comprehension scores. These findings are discussed in the framework of the dynamic attending and active sensing theories.
Collapse
Affiliation(s)
- Emma Berthault
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| | - Sophie Chen
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| | - Simone Falk
- Department of Linguistics and Translation, University of Montreal, Canada; International Laboratory for Brain, Music and Sound Research, Montreal, Canada.
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| | - Daniele Schön
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France.
| |
Collapse
|
48
|
Albouy P, Mehr SA, Hoyer RS, Ginzburg J, Du Y, Zatorre RJ. Spectro-temporal acoustical markers differentiate speech from song across cultures. Nat Commun 2024; 15:4835. [PMID: 38844457 PMCID: PMC11156671 DOI: 10.1038/s41467-024-49040-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
Humans produce two forms of cognitively complex vocalizations: speech and song. It is debated whether these differ based primarily on culturally specific, learned features, or if acoustical features can reliably distinguish them. We study the spectro-temporal modulation patterns of vocalizations produced by 369 people living in 21 urban, rural, and small-scale societies across six continents. Specific ranges of spectral and temporal modulations, overlapping within categories and across societies, significantly differentiate speech from song. Machine-learning classification shows that this effect is cross-culturally robust, vocalizations being reliably classified solely from their spectro-temporal features across all 21 societies. Listeners unfamiliar with the cultures classify these vocalizations using similar spectro-temporal cues as the machine learning algorithm. Finally, spectro-temporal features are better able to discriminate song from speech than a broad range of other acoustical variables, suggesting that spectro-temporal modulation-a key feature of auditory neuronal tuning-accounts for a fundamental difference between these categories.
Collapse
Affiliation(s)
- Philippe Albouy
- CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada.
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada.
- Centre for Research in Brain, Language and Music and Centre for Interdisciplinary Research in Music, Media, and Technology, Montréal, QC, Canada.
| | - Samuel A Mehr
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- School of Psychology, University of Auckland, Auckland, 1010, New Zealand
- Child Study Center, Yale University, New Haven, CT, 06511, USA
| | - Roxane S Hoyer
- CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada
| | - Jérémie Ginzburg
- CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada
- Lyon Neuroscience Research Center, CNRS, UMR5292, INSERM, U1028 - Université Claude Bernard Lyon 1, F-69000, Lyon, France
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Yi Du
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| | - Robert J Zatorre
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada.
- Centre for Research in Brain, Language and Music and Centre for Interdisciplinary Research in Music, Media, and Technology, Montréal, QC, Canada.
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada.
| |
Collapse
|
49
|
Ozaki Y, Tierney A, Pfordresher PQ, McBride JM, Benetos E, Proutskova P, Chiba G, Liu F, Jacoby N, Purdy SC, Opondo P, Fitch WT, Hegde S, Rocamora M, Thorne R, Nweke F, Sadaphal DP, Sadaphal PM, Hadavi S, Fujii S, Choo S, Naruse M, Ehara U, Sy L, Parselelo ML, Anglada-Tort M, Hansen NC, Haiduk F, Færøvik U, Magalhães V, Krzyżanowski W, Shcherbakova O, Hereld D, Barbosa BS, Varella MAC, van Tongeren M, Dessiatnitchenko P, Zar SZ, El Kahla I, Muslu O, Troy J, Lomsadze T, Kurdova D, Tsope C, Fredriksson D, Arabadjiev A, Sarbah JP, Arhine A, Meachair TÓ, Silva-Zurita J, Soto-Silva I, Millalonco NEM, Ambrazevičius R, Loui P, Ravignani A, Jadoul Y, Larrouy-Maestri P, Bruder C, Teyxokawa TP, Kuikuro U, Natsitsabui R, Sagarzazu NB, Raviv L, Zeng M, Varnosfaderani SD, Gómez-Cañón JS, Kolff K, der Nederlanden CVB, Chhatwal M, David RM, Setiawan IPG, Lekakul G, Borsan VN, Nguqu N, Savage PE. Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report. SCIENCE ADVANCES 2024; 10:eadm9797. [PMID: 38748798 PMCID: PMC11095461 DOI: 10.1126/sciadv.adm9797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 04/19/2024] [Indexed: 05/19/2024]
Abstract
Both music and language are found in all known human societies, yet no studies have compared similarities and differences between song, speech, and instrumental music on a global scale. In this Registered Report, we analyzed two global datasets: (i) 300 annotated audio recordings representing matched sets of traditional songs, recited lyrics, conversational speech, and instrumental melodies from our 75 coauthors speaking 55 languages; and (ii) 418 previously published adult-directed song and speech recordings from 209 individuals speaking 16 languages. Of our six preregistered predictions, five were strongly supported: Relative to speech, songs use (i) higher pitch, (ii) slower temporal rate, and (iii) more stable pitches, while both songs and speech used similar (iv) pitch interval size and (v) timbral brightness. Exploratory analyses suggest that features vary along a "musi-linguistic" continuum when including instrumental melodies and recited lyrics. Our study provides strong empirical evidence of cross-cultural regularities in music and speech.
Collapse
Affiliation(s)
- Yuto Ozaki
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, London, UK
| | - Peter Q. Pfordresher
- Department of Psychology, University at Buffalo, State University of New York, Buffalo, NY, USA
| | - John M. McBride
- Center for Algorithmic and Robotized Synthesis, Institute for Basic Science, Ulsan, South Korea
| | - Emmanouil Benetos
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
| | - Polina Proutskova
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
| | - Gakuto Chiba
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
| | - Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Nori Jacoby
- Computational Auditory Perception Group, Max-Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Suzanne C. Purdy
- School of Psychology, University of Auckland, Auckland, New Zealand
- Centre for Brain Research and Eisdell Moore Centre for Hearing and Balance Research, University of Auckland, Auckland, New Zealand
| | - Patricia Opondo
- School of Arts, Music Discipline, University of KwaZulu Natal, Durban, South Africa
| | - W. Tecumseh Fitch
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
| | - Shantala Hegde
- Music Cognition Lab, Department of Clinical Psychology, National Institute of Mental Health and Neuro Sciences, Bangalore, Karnataka, India
| | - Martín Rocamora
- Universidad de la República, Montevideo, Uruguay
- Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
| | - Rob Thorne
- School of Music, Victoria University of Wellington, Wellington, New Zealand
| | - Florence Nweke
- Department of Creative Arts, University of Lagos, Lagos, Nigeria
- Department of Music, Mountain Top University, Ogun, Nigeria
| | - Dhwani P. Sadaphal
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
| | | | - Shafagh Hadavi
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
| | - Shinya Fujii
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Kanagawa, Japan
| | - Sangbuem Choo
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
| | - Marin Naruse
- Faculty of Policy Management, Keio University, Fujisawa, Kanagawa, Japan
| | | | - Latyr Sy
- Independent researcher, Tokyo, Japan
- Independent researcher, Dakar, Sénégal
| | - Mark Lenini Parselelo
- Memorial University of Newfoundland, St. John’s, NL, Canada
- Department of Music and Dance, Kenyatta University, Nairobi, Kenya
| | | | - Niels Chr. Hansen
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark
- Centre of Excellence in Music, Mind, Body and Brain, University of Jyväskylä, Jyväskylä, Finland
- Interacting Minds Centre, School of Culture and Society, Aarhus University, Aarhus, Denmark
- Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
| | - Felix Haiduk
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
- Department of General Psychology, University of Padua, Padua, Italy
| | - Ulvhild Færøvik
- Institute of Biological and Medical Psychology, Department of Psychology, University of Bergen, Bergen, Norway
| | - Violeta Magalhães
- Centre of Linguistics of the University of Porto (CLUP), Porto, Portugal
- Faculty of Arts and Humanities of the University of Porto (FLUP), Porto, Portugal
- School of Education of the Polytechnic of Porto (ESE IPP), Porto, Portugal
| | - Wojciech Krzyżanowski
- Adam Mickiewicz University, Faculty of Art Studies, Musicology Institute, Poznań, Poland
| | | | - Diana Hereld
- Department of Psychiatry, UCLA Semel Institute for Neuroscience and Human Behavior, Los Angeles, CA, USA
| | | | | | | | | | - Su Zar Zar
- Headmistress, The Royal Music Academy, Yangon, Myanmar
| | - Iyadh El Kahla
- Department of Cultural Policy, University of Hildesheim, Hildesheim, Germany
| | - Olcay Muslu
- Centre for the Study of Higher Education, University of Kent, Canterbury, UK
- MIRAS, Centre for Cultural Sustainability, Istanbul, Turkey
| | - Jakelin Troy
- Director, Indigenous Research, Office of the Deputy Vice-Chancellor (Research); Department of Linguistics, Faculty of Arts and Social Sciences, The University of Sydney, Camperdown, NSW, Australia
| | - Teona Lomsadze
- International Research Center for Traditional Polyphony of the Tbilisi State Conservatoire, Tbilisi, Georgia
- Georgian Studies Fellow, University of Oxford, Oxford, UK
| | - Dilyana Kurdova
- South-West University Neofit Rilski, Blagoevgrad, Bulgaria
- Phoenix Perpeticum Foundation, Sofia, Bulgaria
| | | | | | - Aleksandar Arabadjiev
- Department of Folk Music Research and Ethnomusicology, University of Music and Performing Arts–MDW, Wien, Austria
| | | | - Adwoa Arhine
- Department of Music, University of Ghana, Accra, Ghana
| | - Tadhg Ó Meachair
- Department of Ethnomusicology and Folklore, Indiana University, Bloomington, IN, USA
| | - Javier Silva-Zurita
- Department of Humanities and Arts, University of Los Lagos, Osorno, Chile
- Millennium Nucleus on Musical and Sound Cultures (CMUS NCS 2022-16), Santiago, Chile
| | - Ignacio Soto-Silva
- Department of Humanities and Arts, University of Los Lagos, Osorno, Chile
- Millennium Nucleus on Musical and Sound Cultures (CMUS NCS 2022-16), Santiago, Chile
| | | | | | - Psyche Loui
- Music, Imaging and Neural Dynamics Lab, Northeastern University, Boston, MA, USA
| | - Andrea Ravignani
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark & The Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
| | - Yannick Jadoul
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
| | - Pauline Larrouy-Maestri
- Music Department, Max-Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Max Planck—NYU Center for Language, Music, and Emotion (CLaME), New York, NY, USA
| | - Camila Bruder
- Music Department, Max-Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Tutushamum Puri Teyxokawa
- Txemim Puri Project–Puri Language Research, Vitalization and Teaching/Recording and Preservation of Puri History and Culture, Rio de Janeiro, Brasil
| | | | | | | | - Limor Raviv
- Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
- cSCAN, University of Glasgow, Glasgow, UK
| | - Minyu Zeng
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
- Rhode Island School of Design, Providence, RI, USA
| | - Shahaboddin Dabaghi Varnosfaderani
- Institute for English and American Studies (IEAS), Goethe University of Frankfurt am Main, Frankfurt am Main, Germany
- Cognitive and Developmental Psychology Unit, Centre, for Cognitive Science, University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany
| | | | - Kayla Kolff
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | | | - Meyha Chhatwal
- Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Ryan Mark David
- Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada
| | | | - Great Lekakul
- Faculty of Fine Arts, Chiang Mai University, Chiang Mai, Thailand
| | - Vanessa Nina Borsan
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa, Japan
- Université de Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
| | - Nozuko Nguqu
- School of Arts, Music Discipline, University of KwaZulu Natal, Durban, South Africa
| | - Patrick E. Savage
- School of Psychology, University of Auckland, Auckland, New Zealand
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Kanagawa, Japan
| |
Collapse
|
50
|
Chang A, Teng X, Assaneo MF, Poeppel D. The human auditory system uses amplitude modulation to distinguish music from speech. PLoS Biol 2024; 22:e3002631. [PMID: 38805517 PMCID: PMC11132470 DOI: 10.1371/journal.pbio.3002631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/17/2024] [Indexed: 05/30/2024] Open
Abstract
Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated. However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions. Here, we hypothesized that a sound's amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech. Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters. Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music. In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication. The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.
Collapse
Affiliation(s)
- Andrew Chang
- Department of Psychology, New York University, New York, New York, United States of America
| | - Xiangbin Teng
- Department of Psychology, Chinese University of Hong Kong, Hong Kong SAR, China
| | - M. Florencia Assaneo
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - David Poeppel
- Department of Psychology, New York University, New York, New York, United States of America
- Ernst Struengmann Institute for Neuroscience, Frankfurt am Main, Germany
- Center for Language, Music, and Emotion (CLaME), New York University, New York, New York, United States of America
- Music and Audio Research Lab (MARL), New York University, New York, New York, United States of America
| |
Collapse
|