1
|
Gnanateja GN, Rupp K, Llanos F, Hect J, German JS, Teichert T, Abel TJ, Chandrasekaran B. Cortical processing of discrete prosodic patterns in continuous speech. Nat Commun 2025; 16:1947. [PMID: 40032850 PMCID: PMC11876672 DOI: 10.1038/s41467-025-56779-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 01/29/2025] [Indexed: 03/05/2025] Open
Abstract
Prosody has a vital function in speech, structuring a speaker's intended message for the listener. The superior temporal gyrus (STG) is considered a critical hub for prosody, but the role of earlier auditory regions like Heschl's gyrus (HG), associated with pitch processing, remains unclear. Using intracerebral recordings in humans and non-human primate models, we investigated prosody processing in narrative speech, focusing on pitch accents-abstract phonological units that signal word prominence and communicative intent. In humans, HG encoded pitch accents as abstract representations beyond spectrotemporal features, distinct from segmental speech processing, and outperforms STG in disambiguating pitch accents. Multivariate models confirm HG's unique representation of pitch accent categories. In the non-human primate, pitch accents were not abstractly encoded, despite robust spectrotemporal processing, highlighting the role of experience in shaping abstract representations. These findings emphasize a key role for the HG in early prosodic abstraction and advance our understanding of human speech processing.
Collapse
Affiliation(s)
- G Nike Gnanateja
- Speech Processing and Auditory Neuroscience Lab, Department of Communication Sciences and Disorder, University of Wisconsin-Madison, Madison, WI, USA
| | - Kyle Rupp
- Pediatric Brain Electrophysiology Laboratory, Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, USA
| | - Fernando Llanos
- UT Austin Neurolinguistics Lab, Department of Linguistics, The University of Texas at Austin, Austin, TX, USA
| | - Jasmine Hect
- Pediatric Brain Electrophysiology Laboratory, Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, USA
| | - James S German
- Aix-Marseille University, CNRS, LPL, Aix-en-Provence, France
| | - Tobias Teichert
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Taylor J Abel
- Pediatric Brain Electrophysiology Laboratory, Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Bharath Chandrasekaran
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA.
- Roxelyn and Richard Pepper Department of Communication Sciences & Disorders, Northwestern University, Evanston, IL, USA.
- Knowles Hearing Center, Evanston, IL, 60208, USA.
| |
Collapse
|
2
|
Liu Z, He Y, Li W, Cui S, Fu Z, Wang X. Amplifying Sound Intensity of Key Words in Discourse Promotes Memory in Female College Students. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:16-25. [PMID: 39637293 DOI: 10.1044/2024_jslhr-24-00386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
PURPOSE The aim of this study was to determine whether amplification of key words in discourse helped to memorize the words. METHOD We tested the effect of 135 participants' memory for key words in the discourse after intensity amplification (0, 5, 7, 9, and 11 dB), and we also tested physiological indicators to measure attention levels in another 30 participants. Adobe Audition was used to modulate the intensity of key words, whereas E-prime technology was used to present speech stimuli and test the accuracy of the memory of the participants. RESULTS The results showed that amplifying key word intensity by 9 dB led to a significant enhancement in memory, whereas there was no difference in self-reported naturalness between amplification of key word intensity in the 9 dB and nonamplified groups. Heart rate and skin conductance level of the subjects decreased with amplification of key word intensity in the 9-dB group, which indicated that this promoted the memory effect by enhancing attention. CONCLUSIONS Our results demonstrate that amplifying the intensity of the key words by 9 dB is an effective strategy for promoting memory. This research provides a theoretical basis for optimizing the acoustic parameters of audio learning materials to achieve better teaching effects. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.27902643.
Collapse
Affiliation(s)
- Zhenxu Liu
- School of Life Sciences, Central China Normal University, Wuhan
| | - Yajie He
- School of Life Sciences, Central China Normal University, Wuhan
| | - Wenhao Li
- Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan
| | - Sixing Cui
- School of Chinese Language and Literature, Central China Normal University, Wuhan
| | - Ziying Fu
- School of Life Sciences, Central China Normal University, Wuhan
| | - Xin Wang
- School of Life Sciences, Central China Normal University, Wuhan
| |
Collapse
|
3
|
Llanos F, Stump T, Crowhurst M. Investigating the Neural Basis of the Loud-first Principle of the Iambic-Trochaic Law. J Cogn Neurosci 2025; 37:14-27. [PMID: 39231274 DOI: 10.1162/jocn_a_02241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The perception of rhythmic patterns is crucial for the recognition of words in spoken languages, yet it remains unclear how these patterns are represented in the brain. Here, we tested the hypothesis that rhythmic patterns are encoded by neural activity phase-locked to the temporal modulation of these patterns in the speech signal. To test this hypothesis, we analyzed EEGs evoked with long sequences of alternating syllables acoustically manipulated to be perceived as a series of different rhythmic groupings in English. We found that the magnitude of the EEG at the syllable and grouping rates of each sequence was significantly higher than the noise baseline, indicating that the neural parsing of syllables and rhythmic groupings operates at different timescales. Distributional differences between the scalp topographies associated with each timescale suggests a further mechanistic dissociation between the neural segmentation of syllables and groupings. In addition, we observed that the neural tracking of louder syllables, which in trochaic languages like English are associated with the beginning of rhythmic groupings, was more robust than the neural tracking of softer syllables. The results of further bootstrapping and brain-behavior analyses indicate that the perception of rhythmic patterns is modulated by the magnitude of grouping alternations in the neural signal. These findings suggest that the temporal coding of rhythmic patterns in stress-based languages like English is supported by temporal regularities that are linguistically relevant in the speech signal.
Collapse
|
4
|
Duville MM, Alonso-Valerdi LM, Ibarra-Zarate DI. Improved emotion differentiation under reduced acoustic variability of speech in autism. BMC Med 2024; 22:121. [PMID: 38486293 PMCID: PMC10941423 DOI: 10.1186/s12916-024-03341-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 03/05/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Socio-emotional impairments are among the diagnostic criteria for autism spectrum disorder (ASD), but the actual knowledge has substantiated both altered and intact emotional prosodies recognition. Here, a Bayesian framework of perception is considered suggesting that the oversampling of sensory evidence would impair perception within highly variable environments. However, reliable hierarchical structures for spectral and temporal cues would foster emotion discrimination by autistics. METHODS Event-related spectral perturbations (ERSP) extracted from electroencephalographic (EEG) data indexed the perception of anger, disgust, fear, happiness, neutral, and sadness prosodies while listening to speech uttered by (a) human or (b) synthesized voices characterized by reduced volatility and variability of acoustic environments. The assessment of mechanisms for perception was extended to the visual domain by analyzing the behavioral accuracy within a non-social task in which dynamics of precision weighting between bottom-up evidence and top-down inferences were emphasized. Eighty children (mean 9.7 years old; standard deviation 1.8) volunteered including 40 autistics. The symptomatology was assessed at the time of the study via the Autism Diagnostic Observation Schedule, Second Edition, and parents' responses on the Autism Spectrum Rating Scales. A mixed within-between analysis of variance was conducted to assess the effects of group (autism versus typical development), voice, emotions, and interaction between factors. A Bayesian analysis was implemented to quantify the evidence in favor of the null hypothesis in case of non-significance. Post hoc comparisons were corrected for multiple testing. RESULTS Autistic children presented impaired emotion differentiation while listening to speech uttered by human voices, which was improved when the acoustic volatility and variability of voices were reduced. Divergent neural patterns were observed from neurotypicals to autistics, emphasizing different mechanisms for perception. Accordingly, behavioral measurements on the visual task were consistent with the over-precision ascribed to the environmental variability (sensory processing) that weakened performance. Unlike autistic children, neurotypicals could differentiate emotions induced by all voices. CONCLUSIONS This study outlines behavioral and neurophysiological mechanisms that underpin responses to sensory variability. Neurobiological insights into the processing of emotional prosodies emphasized the potential of acoustically modified emotional prosodies to improve emotion differentiation by autistics. TRIAL REGISTRATION BioMed Central ISRCTN Registry, ISRCTN18117434. Registered on September 20, 2020.
Collapse
Affiliation(s)
- Mathilde Marie Duville
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México.
| | - Luz María Alonso-Valerdi
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México
| | - David I Ibarra-Zarate
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México
| |
Collapse
|
5
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
6
|
Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Front Neurosci 2022; 15:738408. [PMID: 35002597 PMCID: PMC8729880 DOI: 10.3389/fnins.2021.738408] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.
Collapse
Affiliation(s)
- Florine L Bachmann
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Ewen N MacDonald
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| |
Collapse
|