1
|
Nora A, Rinkinen O, Renvall H, Service E, Arkkila E, Smolander S, Laasonen M, Salmelin R. Impaired Cortical Tracking of Speech in Children with Developmental Language Disorder. J Neurosci 2024; 44:e2048232024. [PMID: 38589232 PMCID: PMC11140678 DOI: 10.1523/jneurosci.2048-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
In developmental language disorder (DLD), learning to comprehend and express oneself with spoken language is impaired, but the reason for this remains unknown. Using millisecond-scale magnetoencephalography recordings combined with machine learning models, we investigated whether the possible neural basis of this disruption lies in poor cortical tracking of speech. The stimuli were common spoken Finnish words (e.g., dog, car, hammer) and sounds with corresponding meanings (e.g., dog bark, car engine, hammering). In both children with DLD (10 boys and 7 girls) and typically developing (TD) control children (14 boys and 3 girls), aged 10-15 years, the cortical activation to spoken words was best modeled as time-locked to the unfolding speech input at ∼100 ms latency between sound and cortical activation. Amplitude envelope (amplitude changes) and spectrogram (detailed time-varying spectral content) of the spoken words, but not other sounds, were very successfully decoded based on time-locked brain responses in bilateral temporal areas; based on the cortical responses, the models could tell at ∼75-85% accuracy which of the two sounds had been presented to the participant. However, the cortical representation of the amplitude envelope information was poorer in children with DLD compared with TD children at longer latencies (at ∼200-300 ms lag). We interpret this effect as reflecting poorer retention of acoustic-phonetic information in short-term memory. This impaired tracking could potentially affect the processing and learning of words as well as continuous speech. The present results offer an explanation for the problems in language comprehension and acquisition in DLD.
Collapse
Affiliation(s)
- Anni Nora
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Oona Rinkinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
- BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, Helsinki FI-00029, Finland
| | - Elisabet Service
- Department of Linguistics and Languages, Centre for Advanced Research in Experimental and Applied Linguistics (ARiEAL), McMaster University, Hamilton, Ontario L8S 4L8, Canada
- Department of Psychology and Logopedics, University of Helsinki, Helsinki FI-00014, Finland
| | - Eva Arkkila
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
| | - Sini Smolander
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Research Unit of Logopedics, University of Oulu, Oulu FI-90014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Marja Laasonen
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| |
Collapse
|
2
|
Dai B, Zhai Y, Long Y, Lu C. How the Listener's Attention Dynamically Switches Between Different Speakers During a Natural Conversation. Psychol Sci 2024:9567976241243367. [PMID: 38657276 DOI: 10.1177/09567976241243367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024] Open
Abstract
The neural mechanisms underpinning the dynamic switching of a listener's attention between speakers are not well understood. Here we addressed this issue in a natural conversation involving 21 triadic adult groups. Results showed that when the listener's attention dynamically switched between speakers, neural synchronization with the to-be-attended speaker was significantly enhanced, whereas that with the to-be-ignored speaker was significantly suppressed. Along with attention switching, semantic distances between sentences significantly increased in the to-be-ignored speech. Moreover, neural synchronization negatively correlated with the increase in semantic distance but not with acoustic change of the to-be-ignored speech. However, no difference in neural synchronization was found between the listener and the two speakers during the phase of sustained attention. These findings support the attenuation model of attention, indicating that both speech signals are processed beyond the basic physical level. Additionally, shifting attention imposes a cognitive burden, as demonstrated by the opposite fluctuations of interpersonal neural synchronization.
Collapse
Affiliation(s)
- Bohan Dai
- Max Planck Institute for Psycholinguistics
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| | - Yu Zhai
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University
| | - Yuhang Long
- Institute of Developmental Psychology, Faculty of Psychology, Beijing Normal University
| | - Chunming Lu
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University
| |
Collapse
|
3
|
Madsen J, Parra LC. Bidirectional brain-body interactions during natural story listening. Cell Rep 2024; 43:114081. [PMID: 38581682 DOI: 10.1016/j.celrep.2024.114081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 11/25/2023] [Accepted: 03/24/2024] [Indexed: 04/08/2024] Open
Abstract
Narratives can synchronize neural and physiological signals between individuals, but the relationship between these signals, and the underlying mechanism, is unclear. We hypothesized a top-down effect of cognition on arousal and predicted that auditory narratives will drive not only brain signals but also peripheral physiological signals. We find that auditory narratives entrained gaze variation, saccade initiation, pupil size, and heart rate. This is consistent with a top-down effect of cognition on autonomic function. We also hypothesized a bottom-up effect, whereby autonomic physiology affects arousal. Controlled breathing affected pupil size, and heart rate was entrained by controlled saccades. Additionally, fluctuations in heart rate preceded fluctuations of pupil size and brain signals. Gaze variation, pupil size, and heart rate were all associated with anterior-central brain signals. Together, these results suggest bidirectional causal effects between peripheral autonomic function and central brain circuits involved in the control of arousal.
Collapse
Affiliation(s)
- Jens Madsen
- Department of Biomedical Engineering, City College of New York, 85 St. Nicholas Terrace, New York, NY 10031, USA.
| | - Lucas C Parra
- Department of Biomedical Engineering, City College of New York, 85 St. Nicholas Terrace, New York, NY 10031, USA
| |
Collapse
|
4
|
Levy O, Hackmon SL, Zvilichovsky Y, Korisky A, Bidet-Caulet A, Schweitzer JB, Golumbic EZ. Neurophysiological Patterns of Attention and Distraction during Realistic Virtual-Reality Classroom Learning in Adults with and without ADHD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.17.590012. [PMID: 38659916 PMCID: PMC11042341 DOI: 10.1101/2024.04.17.590012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Many people, and particularly those diagnosed with ADHD, report difficulties maintaining attention and proneness to distraction during classroom learning. However, the behavioral, neural and physiological basis of attention in realistic learning contexts is not well understood, since current clinical and scientific tools used for evaluating and quantifying the constructs of "distractibility" and "inattention", are removed from the real-life experience in organic classrooms. Here we introduce a novel Virtual Reality (VR) platform for studying students' brain activity and physiological responses as they immerse in realistic frontal classroom learning. Using this approach, we studied whether adults with and without ADHD (N=49) exhibit differences in neurophysiological metrics associated with sustained attention, such as speech-tracking of the teacher's voice, power of alpha-oscillations and levels of arousal, as well as responses to potential disturbances by background sound-events in the classroom. Under these ecological conditions, we find that adults with ADHD exhibit higher auditory neural response to background sounds relative to their control-peers, which also contributed to explaining variance in the severity of ADHD symptoms, together with higher power of alpha-oscillations and more frequent gaze-shifts around the classroom. These results are in-line with higher sensitivity to irrelevant stimuli in the environment and increased mind-wandering/boredom. At the same time, both groups exhibited similar learning outcomes and showed similar neural tracking of the teacher's speech. This suggests that in this context, attention may not operate as a zero-sum game and that allocating some resources to irrelevant stimuli does not always detract from performing the task at hand. Given the dire need for more objective, dimensional and ecologically-valid measures of attention and its real-life deficits, this work provides new insights into the neurophysiological manifestations of attention and distraction experienced in real-life contexts, while challenging some prevalent notions regarding the nature of attentional challenges experienced by those with ADHD.
Collapse
Affiliation(s)
- Orel Levy
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Yair Zvilichovsky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | - Adi Korisky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Julie B. Schweitzer
- Department of Psychiatry and Behavioral Sciences, University of California, Davis, Sacramento, CA U.S.A
| | | |
Collapse
|
5
|
Keshavarzi M, Mandke K, Macfarlane A, Parvez L, Gabrielczyk F, Wilson A, Goswami U. Atypical beta-band effects in children with dyslexia in response to rhythmic audio-visual speech. Clin Neurophysiol 2024; 160:47-55. [PMID: 38387402 DOI: 10.1016/j.clinph.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 01/31/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024]
Abstract
OBJECTIVE Previous studies have reported atypical delta phase in children with dyslexia, and that delta phase modulates the amplitude of the beta-band response via delta-beta phase-amplitude coupling (PAC). Accordingly, the atypical delta-band effects in children with dyslexia may imply related atypical beta-band effects, particularly regarding delta-beta PAC. Our primary objective was to explore beta-band oscillations in children with and without dyslexia, to explore potentially atypical effects in the beta band in dyslexic children. METHODS We collected EEG data during a rhythmic speech paradigm from 51 children (21 control; 30 dyslexia). We then assessed beta-band phase entrainment, beta-band angular velocity, beta-band power responses and delta-beta PAC. RESULTS We found significant beta-band phase entrainment for control children but not for dyslexic children. Furthermore, children with dyslexia exhibited significantly faster beta-band angular velocity and significantly greater beta-band power. Delta-beta PAC was comparable in both groups. CONCLUSION Atypical beta-band effects were observed in children with dyslexia. However, delta-beta PAC was comparable in both dyslexic and control children. SIGNIFICANCE These findings offer further insights into the neurophysiological basis of atypical rhythmic speech processing by children with dyslexia, suggesting the involvement of a wide range of frequency bands.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom.
| | - Kanad Mandke
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Annabel Macfarlane
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Lyla Parvez
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Fiona Gabrielczyk
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Angela Wilson
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| |
Collapse
|
6
|
Corsini A, Tomassini A, Pastore A, Delis I, Fadiga L, D'Ausilio A. Speech perception difficulty modulates theta-band encoding of articulatory synergies. J Neurophysiol 2024; 131:480-491. [PMID: 38323331 DOI: 10.1152/jn.00388.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 02/08/2024] Open
Abstract
The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception.NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding.
Collapse
Affiliation(s)
- Alessandro Corsini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alice Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Aldo Pastore
- Laboratorio NEST, Scuola Normale Superiore, Pisa, Italy
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, Leeds, United Kingdom
| | - Luciano Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alessandro D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| |
Collapse
|
7
|
Ershaid H, Lizarazu M, McLaughlin D, Cooke M, Simantiraki O, Koutsogiannaki M, Lallier M. Contributions of listening effort and intelligibility to cortical tracking of speech in adverse listening conditions. Cortex 2024; 172:54-71. [PMID: 38215511 DOI: 10.1016/j.cortex.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/05/2023] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Cortical tracking of speech is vital for speech segmentation and is linked to speech intelligibility. However, there is no clear consensus as to whether reduced intelligibility leads to a decrease or an increase in cortical speech tracking, warranting further investigation of the factors influencing this relationship. One such factor is listening effort, defined as the cognitive resources necessary for speech comprehension, and reported to have a strong negative correlation with speech intelligibility. Yet, no studies have examined the relationship between speech intelligibility, listening effort, and cortical tracking of speech. The aim of the present study was thus to examine these factors in quiet and distinct adverse listening conditions. Forty-nine normal hearing adults listened to sentences produced casually, presented in quiet and two adverse listening conditions: cafeteria noise and reverberant speech. Electrophysiological responses were registered with electroencephalogram, and listening effort was estimated subjectively using self-reported scores and objectively using pupillometry. Results indicated varying impacts of adverse conditions on intelligibility, listening effort, and cortical tracking of speech, depending on the preservation of the speech temporal envelope. The more distorted envelope in the reverberant condition led to higher listening effort, as reflected in higher subjective scores, increased pupil diameter, and stronger cortical tracking of speech in the delta band. These findings suggest that using measures of listening effort in addition to those of intelligibility is useful for interpreting cortical tracking of speech results. Moreover, reading and phonological skills of participants were positively correlated with listening effort in the cafeteria condition, suggesting a special role of expert language skills in processing speech in this noisy condition. Implications for future research and theories linking atypical cortical tracking of speech and reading disorders are further discussed.
Collapse
Affiliation(s)
- Hadeel Ershaid
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Drew McLaughlin
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Martin Cooke
- Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| | | | | | - Marie Lallier
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| |
Collapse
|
8
|
Keshavarzi M, Choisdealbha ÁN, Attaheri A, Rocha S, Brusini P, Gibbon S, Boutris P, Mead N, Olawole-Scott H, Ahmed H, Flanagan S, Mandke K, Goswami U. Decoding speech information from EEG data with 4-, 7- and 11-month-old infants: Using convolutional neural network, mutual information-based and backward linear models. J Neurosci Methods 2024; 403:110036. [PMID: 38128783 DOI: 10.1016/j.jneumeth.2023.110036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 12/11/2023] [Accepted: 12/15/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Computational models that successfully decode neural activity into speech are increasing in the adult literature, with convolutional neural networks (CNNs), backward linear models, and mutual information (MI) models all being applied to neural data in relation to speech input. This is not the case in the infant literature. NEW METHOD Three different computational models, two novel for infants, were applied to decode low-frequency speech envelope information. Previously-employed backward linear models were compared to novel CNN and MI-based models. Fifty infants provided EEG recordings when aged 4, 7, and 11 months, while listening passively to natural speech (sung or chanted nursery rhymes) presented by video with a female singer. RESULTS Each model computed speech information for these nursery rhymes in two different low-frequency bands, delta and theta, thought to provide different types of linguistic information. All three models demonstrated significant levels of performance for delta-band neural activity from 4 months of age, with two of three models also showing significant performance for theta-band activity. All models also demonstrated higher accuracy for the delta-band neural responses. None of the models showed developmental (age-related) effects. COMPARISONS WITH EXISTING METHODS The data demonstrate that the choice of algorithm used to decode speech envelope information from neural activity in the infant brain determines the developmental conclusions that can be drawn. CONCLUSIONS The modelling shows that better understanding of the strengths and weaknesses of each modelling approach is fundamental to improving our understanding of how the human brain builds a language system.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK.
| | - Áine Ní Choisdealbha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Adam Attaheri
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Sinead Rocha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Perrine Brusini
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Samuel Gibbon
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Panagiotis Boutris
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Natasha Mead
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Helen Olawole-Scott
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Henna Ahmed
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Sheila Flanagan
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Kanad Mandke
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| |
Collapse
|
9
|
Schüller A, Schilling A, Krauss P, Reichenbach T. The Early Subcortical Response at the Fundamental Frequency of Speech Is Temporally Separated from Later Cortical Contributions. J Cogn Neurosci 2024; 36:475-491. [PMID: 38165737 DOI: 10.1162/jocn_a_02103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Most parts of speech are voiced, exhibiting a degree of periodicity with a fundamental frequency and many higher harmonics. Some neural populations respond to this temporal fine structure, in particular at the fundamental frequency. This frequency-following response to speech consists of both subcortical and cortical contributions and can be measured through EEG as well as through magnetoencephalography (MEG), although both differ in the aspects of neural activity that they capture: EEG is sensitive to both radial and tangential sources as well as to deep sources, whereas MEG is more restrained to the measurement of tangential and superficial neural activity. EEG responses to continuous speech have shown an early subcortical contribution, at a latency of around 9 msec, in agreement with MEG measurements in response to short speech tokens, whereas MEG responses to continuous speech have not yet revealed such an early component. Here, we analyze MEG responses to long segments of continuous speech. We find an early subcortical response at latencies of 4-11 msec, followed by later right-lateralized cortical activities at delays of 20-58 msec as well as potential subcortical activities. Our results show that the early subcortical component of the FFR to continuous speech can be measured from MEG in populations of participants and that its latency agrees with that measured with EEG. They furthermore show that the early subcortical component is temporally well separated from later cortical contributions, enabling an independent assessment of both components toward further aspects of speech processing.
Collapse
Affiliation(s)
| | | | - Patrick Krauss
- Friedrich-Alexander-Universität Erlangen-Nürnberg
- Universitätsklinikum Erlangen
| | | |
Collapse
|
10
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
11
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
12
|
Smith TM, Shen Y, Williams CN, Kidd GR, McAuley JD. Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis. Atten Percept Psychophys 2024; 86:627-642. [PMID: 38012475 DOI: 10.3758/s13414-023-02815-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2023] [Indexed: 11/29/2023]
Abstract
Previous work by McAuley et al. Attention, Perception, & Psychophysics, 82, 3222-3233, (2020), Attention, Perception & Psychophysics, 83, 2229-2240, (2021) showed that disruption of the natural rhythm of target (attended) speech worsens speech recognition in the presence of competing background speech or noise (a target-rhythm effect), while disruption of background speech rhythm improves target recognition (a background-rhythm effect). While these results were interpreted as support for the role of rhythmic regularities in facilitating target-speech recognition amidst competing backgrounds (in line with a selective entrainment hypothesis), questions remain about the factors that contribute to the target-rhythm effect. Experiment 1 ruled out the possibility that the target-rhythm effect relies on a decrease in intelligibility of the rhythm-altered keywords. Sentences from the Coordinate Response Measure (CRM) paradigm were presented with a background of speech-shaped noise, and the rhythm of the initial portion of these target sentences (the target rhythmic context) was altered while critically leaving the target Color and Number keywords intact. Results showed a target-rhythm effect, evidenced by poorer keyword recognition when the target rhythmic context was altered, despite the absence of rhythmic manipulation of the keywords. Experiment 2 examined the influence of the relative onset asynchrony between target and background keywords. Results showed a significant target-rhythm effect that was independent of the effect of target-background keyword onset asynchrony. Experiment 3 provided additional support for the selective entrainment hypothesis by replicating the target-rhythm effect with a set of speech materials that were less rhythmically constrained than the CRM sentences.
Collapse
Affiliation(s)
- Toni M Smith
- Department of Psychology, Michigan State University, East Lansing, MI, USA.
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Christina N Williams
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Gary R Kidd
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
| | - J Devin McAuley
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
13
|
Guerra G, Tierney A, Tijms J, Vaessen A, Bonte M, Dick F. Attentional modulation of neural sound tracking in children with and without dyslexia. Dev Sci 2024; 27:e13420. [PMID: 37350014 DOI: 10.1111/desc.13420] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 04/09/2023] [Accepted: 05/26/2023] [Indexed: 06/24/2023]
Abstract
Auditory selective attention forms an important foundation of children's learning by enabling the prioritisation and encoding of relevant stimuli. It may also influence reading development, which relies on metalinguistic skills including the awareness of the sound structure of spoken language. Reports of attentional impairments and speech perception difficulties in noisy environments in dyslexic readers are also suggestive of the putative contribution of auditory attention to reading development. To date, it is unclear whether non-speech selective attention and its underlying neural mechanisms are impaired in children with dyslexia and to which extent these deficits relate to individual reading and speech perception abilities in suboptimal listening conditions. In this EEG study, we assessed non-speech sustained auditory selective attention in 106 7-to-12-year-old children with and without dyslexia. Children attended to one of two tone streams, detecting occasional sequence repeats in the attended stream, and performed a speech-in-speech perception task. Results show that when children directed their attention to one stream, inter-trial-phase-coherence at the attended rate increased in fronto-central sites; this, in turn, was associated with better target detection. Behavioural and neural indices of attention did not systematically differ as a function of dyslexia diagnosis. However, behavioural indices of attention did explain individual differences in reading fluency and speech-in-speech perception abilities: both these skills were impaired in dyslexic readers. Taken together, our results show that children with dyslexia do not show group-level auditory attention deficits but these deficits may represent a risk for developing reading impairments and problems with speech perception in complex acoustic environments. RESEARCH HIGHLIGHTS: Non-speech sustained auditory selective attention modulates EEG phase coherence in children with/without dyslexia Children with dyslexia show difficulties in speech-in-speech perception Attention relates to dyslexic readers' speech-in-speech perception and reading skills Dyslexia diagnosis is not linked to behavioural/EEG indices of auditory attention.
Collapse
Affiliation(s)
- Giada Guerra
- Centre for Brain and Cognitive Development, Birkbeck College, University of London, London, UK
- Maastricht Brain Imaging Center and Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Adam Tierney
- Centre for Brain and Cognitive Development, Birkbeck College, University of London, London, UK
| | - Jurgen Tijms
- RID, Amsterdam, Netherlands
- Rudolf Berlin Center, Department of Psychology, University of Amsterdam, Amsterdam, Netherlands
| | | | - Milene Bonte
- Maastricht Brain Imaging Center and Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Frederic Dick
- Division of Psychology & Language Sciences, UCL, London, UK
| |
Collapse
|
14
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. Front Hum Neurosci 2023; 17:1283206. [PMID: 38162285 PMCID: PMC10754997 DOI: 10.3389/fnhum.2023.1283206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
| | | | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
15
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
16
|
Ni G, Xu Z, Bai Y, Zheng Q, Zhao R, Wu Y, Ming D. EEG-based assessment of temporal fine structure and envelope effect in mandarin syllable and tone perception. Cereb Cortex 2023; 33:11287-11299. [PMID: 37804238 DOI: 10.1093/cercor/bhad366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 10/09/2023] Open
Abstract
In recent years, speech perception research has benefited from low-frequency rhythm entrainment tracking of the speech envelope. However, speech perception is still controversial regarding the role of speech envelope and temporal fine structure, especially in Mandarin. This study aimed to discuss the dependence of Mandarin syllables and tones perception on the speech envelope and the temporal fine structure. We recorded the electroencephalogram (EEG) of the subjects under three acoustic conditions using the sound chimerism analysis, including (i) the original speech, (ii) the speech envelope and the sinusoidal modulation, and (iii) the fine structure of time and the modulation of the non-speech (white noise) sound envelope. We found that syllable perception mainly depended on the speech envelope, while tone perception depended on the temporal fine structure. The delta bands were prominent, and the parietal and prefrontal lobes were the main activated brain areas, regardless of whether syllable or tone perception was involved. Finally, we decoded the spatiotemporal features of Mandarin perception from the microstate sequence. The spatiotemporal feature sequence of the EEG caused by speech material was found to be specific, suggesting a new perspective for the subsequent auditory brain-computer interface. These results provided a new scheme for the coding strategy of new hearing aids for native Mandarin speakers. HIGHLIGHTS
Collapse
Affiliation(s)
- Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| | - Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Qi Zheng
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yubo Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| |
Collapse
|
17
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
18
|
Vogeti S, Faramarzi M, Herrmann CS. Alpha transcranial alternating current stimulation modulates auditory perception. Brain Stimul 2023; 16:1646-1652. [PMID: 37949295 DOI: 10.1016/j.brs.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 10/30/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Studies using transcranial alternating current stimulation (tACS), a type of non-invasive brain stimulation, have demonstrated a relationship between the positive versus negative phase of both alpha and delta/theta oscillations with variable near-threshold auditory perception. These findings have not been directly compared before. Furthermore, as perception was better in the positive versus negative phase of two different frequencies, it is unclear whether changes in polarity (independent of a specific frequency) could also modulate auditory perception. OBJECTIVE We investigated whether auditory perception depends on the phase of alpha, delta/theta, or polarity alone. METHODS We stimulated participants with alpha, delta, and positive and negative direct current (DC) over temporal and central scalp sites while they identified near-threshold tones-in-noise. A Sham condition without tACS served as a control condition. A repeated-measures analysis of variance was used to assess differences in proportions of hits between conditions and polarities. Permutation-based circular-logistic regressions were used to assess the relationship between circular-predictors and single-trial behavioral responses. An exploratory analysis compared the full circular-logistic regression model to the intercept-only model. RESULTS Overall, there were a greater proportion of hits in the Alpha condition in comparison to Delta, DC, and Sham conditions. We also found an interaction between polarity and stimulation condition; post-hoc analyses revealed a greater proportion of hits in the positive versus negative phase of Alpha tACS. In contrast, no significant differences were found in the Delta, DC, or Sham conditions. The permutation-based circular-logistic regressions did not reveal a statistically significant difference between the obtained RMS of the sine and cosine coefficients and the mean of the surrogate distribution for any of the conditions. However, our exploratory analysis revealed that circular-predictors explained the behavioral data significantly better than an intercept-only model for the Alpha condition, and not the other three conditions. CONCLUSION These findings suggest that alpha tACS, and not delta nor polarity alone, modulates auditory perception.
Collapse
Affiliation(s)
- Sreekari Vogeti
- Experimental Psychology Lab, Department of Psychology, European Medical School, Cluster for Excellence "Hearing for All", Carl von Ossietzky University, Oldenburg, Germany
| | - Maryam Faramarzi
- Experimental Psychology Lab, Department of Psychology, European Medical School, Cluster for Excellence "Hearing for All", Carl von Ossietzky University, Oldenburg, Germany
| | - Christoph S Herrmann
- Experimental Psychology Lab, Department of Psychology, European Medical School, Cluster for Excellence "Hearing for All", Carl von Ossietzky University, Oldenburg, Germany; Neuroimaging Unit, European Medical School, Carl von Ossietzky University, Oldenburg, Germany; Research Center Neurosensory Science, Carl von Ossietzky University, Oldenburg, Germany.
| |
Collapse
|
19
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
20
|
Guilleminot P, Graef C, Butters E, Reichenbach T. Audiotactile Stimulation Can Improve Syllable Discrimination through Multisensory Integration in the Theta Frequency Band. J Cogn Neurosci 2023; 35:1760-1772. [PMID: 37677062 DOI: 10.1162/jocn_a_02045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Syllables are an essential building block of speech. We recently showed that tactile stimuli linked to the perceptual centers of syllables in continuous speech can improve speech comprehension. The rate of syllables lies in the theta frequency range, between 4 and 8 Hz, and the behavioral effect appears linked to multisensory integration in this frequency band. Because this neural activity may be oscillatory, we hypothesized that a behavioral effect may also occur not only while but also after this activity has been evoked or entrained through vibrotactile pulses. Here, we show that audiotactile integration regarding the perception of single syllables, both on the neural and on the behavioral level, is consistent with this hypothesis. We first stimulated participants with a series of vibrotactile pulses and then presented them with a syllable in background noise. We show that, at a delay of 200 msec after the last vibrotactile pulse, audiotactile integration still occurred in the theta band and syllable discrimination was enhanced. Moreover, the dependence of both the neural multisensory integration as well as of the behavioral discrimination on the delay of the audio signal with respect to the last tactile pulse was consistent with a damped oscillation. In addition, the multisensory gain is correlated with the syllable discrimination score. Our results therefore evidence the role of the theta band in audiotactile integration and provide evidence that these effects may involve oscillatory activity that still persists after the tactile stimulation.
Collapse
|
21
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
22
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
23
|
Rong P, Taylor A. A Vowel-Centric View Toward Characterizing Temporal Organization of Motor Speech Activities in Neurologically Impaired and Healthy Speakers. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3697-3720. [PMID: 37607386 DOI: 10.1044/2023_jslhr-23-00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
PURPOSE This study tested the hypotheses that (a) motor speech activities are temporally organized around the nuclei into vowel-centric units that hold both stability and flexibility and (b) such temporal organization is impacted by motor speech impairment. METHOD Thirteen individuals with amyotrophic lateral sclerosis and 10 healthy controls read a sentence 3 times at each of the following rates: habitual, fast, and slow. Articulatory gestures and phonatory event were assessed in two vowel-centric units, as operationally defined within and across the boundaries of two target words-cat and must-to accommodate common coda omission and coarticulation. Twelve absolute and relative timing measures centering on the nucleus were derived to characterize the temporal organization of each unit. These measures were evaluated in terms of (a) their relations with global duration across rate conditions and (b) between-groups differences for the habitual rate condition. RESULTS Both vowel-centric units remained stable in relative timing between the articulatory gestures approaching and moving away from the nucleus across rate conditions. Relative timing between the articulatory gestures and phonatory event at smaller temporal granularities varied with global duration, but in different ways for neurologically impaired and healthy speakers. Disease impacts on relative timing were only detected across word boundaries. All absolute timing measures revealed consistent temporal scaling effects and disease-related prolongations. CONCLUSIONS The findings provide preliminary support for vowel-centric temporal organization of motor speech activities. Such temporal organization holds some extent of both stability and flexibility, which may facilitate the parsing of syllabic events during auditory processing, while accommodating task-specific suprasegmental variations. The timing impairments in amyotrophic lateral sclerosis are likely attributed to the disease-imposed dynamic constraints, reducing the entrainment of the related motor speech activities to the underlying linguistic elements. These findings have potential implications in guiding the assessment and management of temporal speech deficits in ALS.
Collapse
Affiliation(s)
- Panying Rong
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence
| | - Ava Taylor
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence
| |
Collapse
|
24
|
Quique YM, Gnanateja GN, Dickey MW, Evans WS, Chandrasekaran B. Examining cortical tracking of the speech envelope in post-stroke aphasia. Front Hum Neurosci 2023; 17:1122480. [PMID: 37780966 PMCID: PMC10538638 DOI: 10.3389/fnhum.2023.1122480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 08/28/2023] [Indexed: 10/03/2023] Open
Abstract
Introduction People with aphasia have been shown to benefit from rhythmic elements for language production during aphasia rehabilitation. However, it is unknown whether rhythmic processing is associated with such benefits. Cortical tracking of the speech envelope (CTenv) may provide a measure of encoding of speech rhythmic properties and serve as a predictor of candidacy for rhythm-based aphasia interventions. Methods Electroencephalography was used to capture electrophysiological responses while Spanish speakers with aphasia (n = 9) listened to a continuous speech narrative (audiobook). The Temporal Response Function was used to estimate CTenv in the delta (associated with word- and phrase-level properties), theta (syllable-level properties), and alpha bands (attention-related properties). CTenv estimates were used to predict aphasia severity, performance in rhythmic perception and production tasks, and treatment response in a sentence-level rhythm-based intervention. Results CTenv in delta and theta, but not alpha, predicted aphasia severity. Neither CTenv in delta, alpha, or theta bands predicted performance in rhythmic perception or production tasks. Some evidence supported that CTenv in theta could predict sentence-level learning in aphasia, but alpha and delta did not. Conclusion CTenv of the syllable-level properties was relatively preserved in individuals with less language impairment. In contrast, higher encoding of word- and phrase-level properties was relatively impaired and was predictive of more severe language impairments. CTenv and treatment response to sentence-level rhythm-based interventions need to be further investigated.
Collapse
Affiliation(s)
- Yina M. Quique
- Center for Education in Health Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - G. Nike Gnanateja
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States
| | - Michael Walsh Dickey
- VA Pittsburgh Healthcare System, Pittsburgh, PA, United States
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
- Roxelyn and Richard Pepper Department of Communication Science and Disorders, School of Communication. Northwestern University, Evanston, IL, United States
| |
Collapse
|
25
|
Sarmukadam K, Behroozmand R. Neural oscillations reveal disrupted functional connectivity associated with impaired speech auditory feedback control in post-stroke aphasia. Cortex 2023; 166:258-274. [PMID: 37437320 PMCID: PMC10527672 DOI: 10.1016/j.cortex.2023.05.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/11/2023] [Accepted: 05/24/2023] [Indexed: 07/14/2023]
Abstract
The oscillatory brain activities reflect neuro-computational processes that are critical for speech production and sensorimotor control. In the present study, we used neural oscillations in left-hemisphere stroke survivors with aphasia as a model to investigate network-level functional connectivity deficits associated with disrupted speech auditory feedback control. Electroencephalography signals were recorded from 40 post-stroke aphasia and 39 neurologically intact control participants while they performed speech vowel production and listening tasks under pitch-shifted altered auditory feedback (AAF) conditions. Using weighted phase-lag index, we calculated broadband (1-70 Hz) functional neural connectivity between electrode pairs covering the frontal, pre- and post-central, and parietal regions. Results revealed reduced fronto-central delta and theta band and centro-parietal low-beta band connectivity in left-hemisphere electrodes associated with diminished speech AAF compensation responses in post-stroke aphasia compared with controls. Lesion-mapping analysis demonstrated that stroke-induced damage to multi-modal brain networks within the inferior frontal gyrus, Rolandic operculum, inferior parietal lobule, angular gyrus, and supramarginal gyrus predicted the reduced functional neural connectivity within the delta and low-beta bands during both tasks in aphasia. These results provide evidence that disrupted neural connectivity due to left-hemisphere brain damage can result in network-wide dysfunctions associated with impaired sensorimotor integration mechanisms for speech auditory feedback control.
Collapse
Affiliation(s)
- Kimaya Sarmukadam
- Speech Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States.
| | - Roozbeh Behroozmand
- Speech Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States.
| |
Collapse
|
26
|
Ling Y, Xu C, Wen X, Li J, Gao J, Luo B. Cortical responses to auditory stimulation predict the prognosis of patients with disorders of consciousness. Clin Neurophysiol 2023; 153:11-20. [PMID: 37385110 DOI: 10.1016/j.clinph.2023.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 05/15/2023] [Accepted: 06/03/2023] [Indexed: 07/01/2023]
Abstract
OBJECTIVE This study aimed to assess the prognosis of patients with disorders of consciousness (DoC) using auditory stimulation with electroencephalogram (EEG) recordings. METHODS We enrolled 72 patients with DoC in the study, which involved subjecting patients to auditory stimulation while EEG responses were recorded. Coma Recovery Scale-Revised (CRS-R) scores and Glasgow Outcome Scale (GOS) were determined for each patient and followed up for three months. A frequency spectrum analysis was performed on the EEG recordings. Finally, the power spectral density (PSD) index was used to predict the prognosis of patients with DoC based on a support vector machine (SVM) model. RESULTS Power spectral analyses revealed that the cortical response to auditory stimulation showed a decreasing trend with decreasing consciousness levels. Auditory stimulation-induced changes in absolute PSD at the delta and theta bands were positively correlated with the CRS-R and GOS scores. Furthermore, these cortical responses to auditory stimulation had a good ability to discriminate between good and poor prognoses of patients with DoC. CONCLUSIONS Auditory stimulation-induced changes in the PSD were highly predictive of DoC outcomes. SIGNIFICANCE Our findings showed that cortical responses to auditory stimulation may be an important electrophysiological indicator of prognosis in patients with DoC.
Collapse
Affiliation(s)
- Yi Ling
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou 310016, China
| | - Xinrui Wen
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China.
| |
Collapse
|
27
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554451. [PMID: 37662393 PMCID: PMC10473711 DOI: 10.1101/2023.08.23.554451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
28
|
Deoisres S, Lu Y, Vanheusden FJ, Bell SL, Simpson DM. Continuous speech with pauses inserted between words increases cortical tracking of speech envelope. PLoS One 2023; 18:e0289288. [PMID: 37498891 PMCID: PMC10374040 DOI: 10.1371/journal.pone.0289288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 07/17/2023] [Indexed: 07/29/2023] Open
Abstract
The decoding multivariate Temporal Response Function (decoder) or speech envelope reconstruction approach is a well-known tool for assessing the cortical tracking of speech envelope. It is used to analyse the correlation between the speech stimulus and the neural response. It is known that auditory late responses are enhanced with longer gaps between stimuli, but it is not clear if this applies to the decoder, and whether the addition of gaps/pauses in continuous speech could be used to increase the envelope reconstruction accuracy. We investigated this in normal hearing participants who listened to continuous speech with no added pauses (natural speech), and then with short (250 ms) or long (500 ms) silent pauses inserted between each word. The total duration for continuous speech stimulus with no, short, and long pauses were approximately, 10 minutes, 16 minutes, and 21 minutes, respectively. EEG and speech envelope were simultaneously acquired and then filtered into delta (1-4 Hz) and theta (4-8 Hz) frequency bands. In addition to analysing responses to the whole speech envelope, speech envelope was also segmented to focus response analysis on onset and non-onset regions of speech separately. Our results show that continuous speech with additional pauses inserted between words significantly increases the speech envelope reconstruction correlations compared to using natural speech, in both the delta and theta frequency bands. It also appears that these increase in speech envelope reconstruction are dominated by the onset regions in the speech envelope. Introducing pauses in speech stimuli has potential clinical benefit for increasing auditory evoked response detectability, though with the disadvantage of speech sounding less natural. The strong effect of pauses and onsets on the decoder should be considered when comparing results from different speech corpora. Whether the increased cortical response, when longer pauses are introduced, reflect improved intelligibility requires further investigation.
Collapse
Affiliation(s)
- Suwijak Deoisres
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Yuhan Lu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Frederique J Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
| | - Steven L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
29
|
Gong XL, Huth AG, Deniz F, Johnson K, Gallant JL, Theunissen FE. Phonemic segmentation of narrative speech in human cerebral cortex. Nat Commun 2023; 14:4309. [PMID: 37463907 PMCID: PMC10354060 DOI: 10.1038/s41467-023-39872-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 06/29/2023] [Indexed: 07/20/2023] Open
Abstract
Speech processing requires extracting meaning from acoustic patterns using a set of intermediate representations based on a dynamic segmentation of the speech stream. Using whole brain mapping obtained in fMRI, we investigate the locus of cortical phonemic processing not only for single phonemes but also for short combinations made of diphones and triphones. We find that phonemic processing areas are much larger than previously described: they include not only the classical areas in the dorsal superior temporal gyrus but also a larger region in the lateral temporal cortex where diphone features are best represented. These identified phonemic regions overlap with the lexical retrieval region, but we show that short word retrieval is not sufficient to explain the observed responses to diphones. Behavioral studies have shown that phonemic processing and lexical retrieval are intertwined. Here, we also have identified candidate regions within the speech cortical network where this joint processing occurs.
Collapse
Affiliation(s)
- Xue L Gong
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
| | - Alexander G Huth
- Departments of Neuroscience and Computer Science, University of Texas, Austin, Austin, 78712, TX, USA
| | - Fatma Deniz
- Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, 10587, Berlin, Germany
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Frédéric E Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, 94720, CA, USA.
| |
Collapse
|
30
|
Jeon MJ, Woo J. Effect of speech-stimulus degradation on phoneme-related potential. PLoS One 2023; 18:e0287584. [PMID: 37352220 PMCID: PMC10289326 DOI: 10.1371/journal.pone.0287584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 06/08/2023] [Indexed: 06/25/2023] Open
Abstract
Auditory evoked potential (AEP) has been used to evaluate the degree of hearing and speech cognition. Because AEP generates a very small voltage relative to ambient noise, a repetitive presentation of a stimulus, such as a tone, word, or short sentence, should be employed to generate ensemble averages over trials. However, the stimulation of repetitive short words and sentences may present an unnatural situation to a subject. Phoneme-related potentials (PRPs), which are evoked-responses to typical phonemic stimuli, can be extracted from electroencephalography (EEG) data in response to a continuous storybook. In this study, we investigated the effects of spectrally degraded speech stimuli on PRPs. The EEG data in response to the spectrally degraded and natural storybooks were recorded from normal listeners, and the PRP components for 10 vowels and 12 consonants were extracted. The PRP responses to a vocoded (spectrally-degraded) storybook showed a statistically significant lower peak amplitude and were prolonged compared with those of a natural storybook. The findings in this study suggest that PRPs can be considered a potential tool to evaluate hearing and speech cognition as other AEPs. Moreover, PRPs can provide the details of phonological processing and phonemic awareness to understand poor speech intelligibility. Further investigation with the hearing impaired is required prior to clinical application.
Collapse
Affiliation(s)
- Min-Jae Jeon
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea
| | - Jihwan Woo
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea
- Department of Biomedical Engineering, University of Ulsan, Ulsan, Republic of Korea
| |
Collapse
|
31
|
Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hear Res 2023; 433:108767. [PMID: 37060895 PMCID: PMC10559335 DOI: 10.1016/j.heares.2023.108767] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/29/2023] [Accepted: 04/09/2023] [Indexed: 04/17/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.
Collapse
Affiliation(s)
- Elsa Lindboom
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Aaron Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA.
| | - Edmund C Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
32
|
Turri C, Di Dona G, Santoni A, Zamfira DA, Franchin L, Melcher D, Ronconi L. Periodic and Aperiodic EEG Features as Potential Markers of Developmental Dyslexia. Biomedicines 2023; 11:1607. [PMID: 37371702 DOI: 10.3390/biomedicines11061607] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/26/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
Developmental Dyslexia (DD) is a neurobiological condition affecting the ability to read fluently and/or accurately. Analyzing resting-state electroencephalographic (EEG) activity in DD may provide a deeper characterization of the underlying pathophysiology and possible biomarkers. So far, studies investigating resting-state activity in DD provided limited evidence and did not consider the aperiodic component of the power spectrum. In the present study, adults with (n = 26) and without DD (n = 31) underwent a reading skills assessment and resting-state EEG to investigate potential alterations in aperiodic activity, their impact on the periodic counterpart and reading performance. In parieto-occipital channels, DD participants showed a significantly different aperiodic activity as indexed by a flatter and lower power spectrum. These aperiodic measures were significantly related to text reading time, suggesting a link with individual differences in reading difficulties. In the beta band, the DD group showed significantly decreased aperiodic-adjusted power compared to typical readers, which was significantly correlated to word reading accuracy. Overall, here we provide evidence showing alterations of the endogenous aperiodic activity in DD participants consistently with the increased neural noise hypothesis. In addition, we confirm alterations of endogenous beta rhythms, which are discussed in terms of their potential link with magnocellular-dorsal stream deficit.
Collapse
Affiliation(s)
- Chiara Turri
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Giuseppe Di Dona
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Alessia Santoni
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
| | - Denisa Adina Zamfira
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Laura Franchin
- Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
| | - David Melcher
- Department of Psychology and Cognitive Science, University of Trento, 38068 Rovereto, Italy
- Psychology Program, Division of Science, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
- Center for Brain and Health, NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
| | - Luca Ronconi
- School of Psychology, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| |
Collapse
|
33
|
Zioga I, Weissbart H, Lewis AG, Haegens S, Martin AE. Naturalistic Spoken Language Comprehension Is Supported by Alpha and Beta Oscillations. J Neurosci 2023; 43:3718-3732. [PMID: 37059462 PMCID: PMC10198453 DOI: 10.1523/jneurosci.1500-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 03/17/2023] [Accepted: 03/23/2023] [Indexed: 04/16/2023] Open
Abstract
Brain oscillations are prevalent in all species and are involved in numerous perceptual operations. α oscillations are thought to facilitate processing through the inhibition of task-irrelevant networks, while β oscillations are linked to the putative reactivation of content representations. Can the proposed functional role of α and β oscillations be generalized from low-level operations to higher-level cognitive processes? Here we address this question focusing on naturalistic spoken language comprehension. Twenty-two (18 female) Dutch native speakers listened to stories in Dutch and French while MEG was recorded. We used dependency parsing to identify three dependency states at each word: the number of (1) newly opened dependencies, (2) dependencies that remained open, and (3) resolved dependencies. We then constructed forward models to predict α and β power from the dependency features. Results showed that dependency features predict α and β power in language-related regions beyond low-level linguistic features. Left temporal, fundamental language regions are involved in language comprehension in α, while frontal and parietal, higher-order language regions, and motor regions are involved in β. Critically, α- and β-band dynamics seem to subserve language comprehension tapping into syntactic structure building and semantic composition by providing low-level mechanistic operations for inhibition and reactivation processes. Because of the temporal similarity of the α-β responses, their potential functional dissociation remains to be elucidated. Overall, this study sheds light on the role of α and β oscillations during naturalistic spoken language comprehension, providing evidence for the generalizability of these dynamics from perceptual to complex linguistic processes.SIGNIFICANCE STATEMENT It remains unclear whether the proposed functional role of α and β oscillations in perceptual and motor function is generalizable to higher-level cognitive processes, such as spoken language comprehension. We found that syntactic features predict α and β power in language-related regions beyond low-level linguistic features when listening to naturalistic speech in a known language. We offer experimental findings that integrate a neuroscientific framework on the role of brain oscillations as "building blocks" with spoken language comprehension. This supports the view of a domain-general role of oscillations across the hierarchy of cognitive functions, from low-level sensory operations to abstract linguistic processes.
Collapse
Affiliation(s)
- Ioanna Zioga
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Hugo Weissbart
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
| | - Ashley G Lewis
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Saskia Haegens
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Department of Psychiatry, Columbia University, New York, New York 10032
- Division of Systems Neuroscience, New York State Psychiatric Institute, New York, New York 10032
| | - Andrea E Martin
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| |
Collapse
|
34
|
Lechner S, Northoff G. Prolonged Intrinsic Neural Timescales Dissociate from Phase Coherence in Schizophrenia. Brain Sci 2023; 13:brainsci13040695. [PMID: 37190660 DOI: 10.3390/brainsci13040695] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023] Open
Abstract
Input processing in the brain is mediated by phase synchronization and intrinsic neural timescales, both of which have been implicated in schizophrenia. Their relationship remains unclear, though. Recruiting a schizophrenia EEG sample from the B-SNIP consortium dataset (n = 134, 70 schizophrenia patients, 64 controls), we investigate phase synchronization, as measured by intertrial phase coherence (ITPC), and intrinsic neural timescales, as measured by the autocorrelation window (ACW) during both the rest and oddball-task states. The main goal of our paper was to investigate whether reported shifts from shorter to longer timescales are related to decreased ITPC. Our findings show (i) decreases in both theta and alpha ITPC in response to both standard and deviant tones; and (iii) a negative correlation of ITPC and ACW in healthy subjects while such correlation is no longer present in SCZ participants. Together, we demonstrate evidence of abnormally long intrinsic neural timescales (ACW) in resting-state EEG of schizophrenia as well as their dissociation from phase synchronization (ITPC). Our data suggest that, during input processing, the resting state's abnormally long intrinsic neural timescales tilt the balance of temporal segregation and integration towards the latter. That results in temporal imprecision with decreased phase synchronization in response to inputs. Our findings provide further evidence for a basic temporal disturbance in schizophrenia on the different timescales (longer ACW and shorter ITPC), which, in the future, might be able to explain common symptoms related to the temporal experience in schizophrenia, for example temporal fragmentation.
Collapse
Affiliation(s)
- Stephan Lechner
- The Royal's Institute of Mental Health Research, Brain and Mind Research Institute, University of Ottawa, Ottawa, ON K1Z 7K4, Canada
- Research Group Neuroinformatics, Faculty of Computer Science, University of Vienna, 1010 Vienna, Austria
- Vienna Doctoral School Cognition, Behavior and Neuroscience, University of Vienna, 1030 Vienna, Austria
| | - Georg Northoff
- The Royal's Institute of Mental Health Research, Brain and Mind Research Institute, University of Ottawa, Ottawa, ON K1Z 7K4, Canada
- Centre for Neural Dynamics, Faculty of Medicine, University of Ottawa, Roger Guindon Hall 451 Smyth Road, Ottawa, ON K1H 8M5, Canada
- Mental Health Centre, School of Medicine, Zhejiang University, Hangzhou 310013, China
- Centre for Cognition and Brain Disorders, Hangzhou Normal University, Hangzhou 310013, China
| |
Collapse
|
35
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
36
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
37
|
Kösem A, Dai B, McQueen JM, Hagoort P. Neural tracking of speech envelope does not unequivocally reflect intelligibility. Neuroimage 2023; 272:120040. [PMID: 36935084 DOI: 10.1016/j.neuroimage.2023.120040] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 03/13/2023] [Accepted: 03/15/2023] [Indexed: 03/19/2023] Open
Abstract
During listening, brain activity tracks the rhythmic structures of speech signals. Here, we directly dissociated the contribution of neural envelope tracking in the processing of speech acoustic cues from that related to linguistic processing. We examined the neural changes associated with the comprehension of Noise-Vocoded (NV) speech using magnetoencephalography (MEG). Participants listened to NV sentences in a 3-phase training paradigm: (1) pre-training, where NV stimuli were barely comprehended, (2) training with exposure of the original clear version of speech stimulus, and (3) post-training, where the same stimuli gained intelligibility from the training phase. Using this paradigm, we tested if the neural responses of a speech signal was modulated by its intelligibility without any change in its acoustic structure. To test the influence of spectral degradation on neural envelope tracking independently of training, participants listened to two types of NV sentences (4-band and 2-band NV speech), but were only trained to understand 4-band NV speech. Significant changes in neural tracking were observed in the delta range in relation to the acoustic degradation of speech. However, we failed to find a direct effect of intelligibility on the neural tracking of speech envelope in both theta and delta ranges, in both auditory regions-of-interest and whole-brain sensor-space analyses. This suggests that acoustics greatly influence the neural tracking response to speech envelope, and that caution needs to be taken when choosing the control signals for speech-brain tracking analyses, considering that a slight change in acoustic parameters can have strong effects on the neural tracking response.
Collapse
Affiliation(s)
- Anne Kösem
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Lyon Neuroscience Research Center (CRNL), CoPhy Team, INSERM U1028, 69500 Bron, France.
| | - Bohan Dai
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - James M McQueen
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
38
|
Wohltjen S, Toth B, Boncz A, Wheatley T. Synchrony to a beat predicts synchrony with other minds. Sci Rep 2023; 13:3591. [PMID: 36869056 PMCID: PMC9984464 DOI: 10.1038/s41598-023-29776-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 02/10/2023] [Indexed: 03/05/2023] Open
Abstract
Synchrony has been used to describe simple beat entrainment as well as correlated mental processes between people, leading some to question whether the term conflates distinct phenomena. Here we ask whether simple synchrony (beat entrainment) predicts more complex attentional synchrony, consistent with a common mechanism. While eye-tracked, participants listened to regularly spaced tones and indicated changes in volume. Across multiple sessions, we found a reliable individual difference: some people entrained their attention more than others, as reflected in beat-matched pupil dilations that predicted performance. In a second study, eye-tracked participants completed the beat task and then listened to a storyteller, who had been previously recorded while eye-tracked. An individual's tendency to entrain to a beat predicted how strongly their pupils synchronized with those of the storyteller, a corollary of shared attention. The tendency to synchronize is a stable individual difference that predicts attentional synchrony across contexts and complexity.
Collapse
Affiliation(s)
- Sophie Wohltjen
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA.
- Psychology Department, University of Wisconsin, 1202 West Johnson St. Madison, Madison, WI, 53706, USA.
| | - Brigitta Toth
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest, 1117, Hungary
| | - Adam Boncz
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, Budapest, 1117, Hungary
| | - Thalia Wheatley
- Psychological and Brain Sciences Department, Dartmouth College, 6207 Moore Hall, Hanover, NH, 03755, USA
- Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
39
|
Weise A, Grimm S, Maria Rimmele J, Schröger E. Auditory representations for long lasting sounds: Insights from event-related brain potentials and neural oscillations. BRAIN AND LANGUAGE 2023; 237:105221. [PMID: 36623340 DOI: 10.1016/j.bandl.2022.105221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The basic features of short sounds, such as frequency and intensity including their temporal dynamics, are integrated in a unitary representation. Knowledge on how our brain processes long lasting sounds is scarce. We review research utilizing the Mismatch Negativity event-related potential and neural oscillatory activity for studying representations for long lasting simple versus complex sounds such as sinusoidal tones versus speech. There is evidence for a temporal constraint in the formation of auditory representations: Auditory edges like sound onsets within long lasting sounds open a temporal window of about 350 ms in which the sounds' dynamics are integrated into a representation, while information beyond that window contributes less to that representation. This integration window segments the auditory input into short chunks. We argue that the representations established in adjacent integration windows can be concatenated into an auditory representation of a long sound, thus, overcoming the temporal constraint.
Collapse
Affiliation(s)
- Annekathrin Weise
- Department of Psychology, Ludwig-Maximilians-University Munich, Germany; Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Sabine Grimm
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Johanna Maria Rimmele
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Germany; Center for Language, Music and Emotion, New York University, Max Planck Institute, Department of Psychology, 6 Washington Place, New York, NY 10003, United States.
| | - Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| |
Collapse
|
40
|
Accou B, Vanthornhout J, Hamme HV, Francart T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci Rep 2023; 13:812. [PMID: 36646740 PMCID: PMC9842721 DOI: 10.1038/s41598-022-27332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 12/30/2022] [Indexed: 01/18/2023] Open
Abstract
To investigate the processing of speech in the brain, commonly simple linear models are used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly-dynamic, complex non-linear system like the brain, and they often require a substantial amount of subject-specific training data. This work introduces a novel speech decoder architecture: the Very Large Augmented Auditory Inference (VLAAI) network. The VLAAI network outperformed state-of-the-art subject-independent models (median Pearson correlation of 0.19, p < 0.001), yielding an increase over the well-established linear model by 52%. Using ablation techniques, we identified the relative importance of each part of the VLAAI network and found that the non-linear components and output context module influenced model performance the most (10% relative performance increase). Subsequently, the VLAAI network was evaluated on a holdout dataset of 26 subjects and a publicly available unseen dataset to test generalization for unseen subjects and stimuli. No significant difference was found between the default test and the holdout subjects, and between the default test set and the public dataset. The VLAAI network also significantly outperformed all baseline models on the public dataset. We evaluated the effect of training set size by training the VLAAI network on data from 1 up to 80 subjects and evaluated on 26 holdout subjects, revealing a relationship following a hyperbolic tangent function between the number of subjects in the training set and the performance on unseen subjects. Finally, the subject-independent VLAAI network was finetuned for 26 holdout subjects to obtain subject-specific VLAAI models. With 5 minutes of data or more, a significant performance improvement was found, up to 34% (from 0.18 to 0.25 median Pearson correlation) with regards to the subject-independent VLAAI network.
Collapse
Affiliation(s)
- Bernd Accou
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium. .,PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium.
| | | | - Hugo Van Hamme
- PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium
| | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium.
| |
Collapse
|
41
|
Chalas N, Daube C, Kluger DS, Abbasi O, Nitsch R, Gross J. Speech onsets and sustained speech contribute differentially to delta and theta speech tracking in auditory cortex. Cereb Cortex 2023; 33:6273-6281. [PMID: 36627246 DOI: 10.1093/cercor/bhac502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 01/12/2023] Open
Abstract
When we attentively listen to an individual's speech, our brain activity dynamically aligns to the incoming acoustic input at multiple timescales. Although this systematic alignment between ongoing brain activity and speech in auditory brain areas is well established, the acoustic events that drive this phase-locking are not fully understood. Here, we use magnetoencephalographic recordings of 24 human participants (12 females) while they were listening to a 1 h story. We show that whereas speech-brain coupling is associated with sustained acoustic fluctuations in the speech envelope in the theta-frequency range (4-7 Hz), speech tracking in the low-frequency delta (below 1 Hz) was strongest around onsets of speech, like the beginning of a sentence. Crucially, delta tracking in bilateral auditory areas was not sustained after onsets, proposing a delta tracking during continuous speech perception that is driven by speech onsets. We conclude that both onsets and sustained components of speech contribute differentially to speech tracking in delta- and theta-frequency bands, orchestrating sampling of continuous speech. Thus, our results suggest a temporal dissociation of acoustically driven oscillatory activity in auditory areas during speech tracking, providing valuable implications for orchestration of speech tracking at multiple time scales.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany.,Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Christoph Daube
- Centre for Cognitive Neuroimaging, University of Glasgow, 56-64 Hillhead Street, G12 8QB, Glasgow, United Kingdom
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| |
Collapse
|
42
|
Niesen M, Bourguignon M, Bertels J, Vander Ghinst M, Wens V, Goldman S, De Tiège X. Cortical tracking of lexical speech units in a multi-talker background is immature in school-aged children. Neuroimage 2023; 265:119770. [PMID: 36462732 DOI: 10.1016/j.neuroimage.2022.119770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 11/09/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022] Open
Abstract
Children have more difficulty perceiving speech in noise than adults. Whether this difficulty relates to an immature processing of prosodic or linguistic elements of the attended speech is still unclear. To address the impact of noise on linguistic processing per se, we assessed how babble noise impacts the cortical tracking of intelligible speech devoid of prosody in school-aged children and adults. Twenty adults and twenty children (7-9 years) listened to synthesized French monosyllabic words presented at 2.5 Hz, either randomly or in 4-word hierarchical structures wherein 2 words formed a phrase at 1.25 Hz, and 2 phrases formed a sentence at 0.625 Hz, with or without babble noise. Neuromagnetic responses to words, phrases and sentences were identified and source-localized. Children and adults displayed significant cortical tracking of words in all conditions, and of phrases and sentences only when words formed meaningful sentences. In children compared with adults, the cortical tracking was lower for all linguistic units in conditions without noise. In the presence of noise, the cortical tracking was similarly reduced for sentence units in both groups, but remained stable for phrase units. Critically, when there was noise, adults increased the cortical tracking of monosyllabic words in the inferior frontal gyri and supratemporal auditory cortices but children did not. This study demonstrates that the difficulties of school-aged children in understanding speech in a multi-talker background might be partly due to an immature tracking of lexical but not supra-lexical linguistic units.
Collapse
Affiliation(s)
- Maxime Niesen
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium.
| | - Mathieu Bourguignon
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Laboratory of Neurophysiology and Movement Biomechanics, 1070 Brussels, Belgium.; BCBL, Basque Center on Cognition, Brain and Language, 20009 San Sebastian, Spain
| | - Julie Bertels
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Cognition and Computation group, ULBabyLab - Consciousness, Brussels, Belgium
| | - Marc Vander Ghinst
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium
| | - Vincent Wens
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| | - Serge Goldman
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Nuclear Medicine, 1070 Brussels, Belgium
| | - Xavier De Tiège
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| |
Collapse
|
43
|
Rong P, Heidrick L. Functional Role of Temporal Patterning of Articulation in Speech Production: A Novel Perspective Toward Global Timing-Based Motor Speech Assessment and Rehabilitation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4577-4607. [PMID: 36399794 DOI: 10.1044/2022_jslhr-22-00089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE This study aimed to (a) relate temporal patterning of articulation to functional speech outcomes in neurologically healthy and impaired speakers, (b) identify changes in temporal patterning of articulation in neurologically impaired speakers, and (c) evaluate how these changes can be modulated by speaking rate manipulation. METHOD Thirteen individuals with amyotrophic lateral sclerosis (ALS) and 10 neurologically healthy controls read a sentence 3 times, first at their habitual rate and then at a voluntarily slowed rate. Temporal patterning of articulation was assessed by 24 features characterizing the modulation patterns within (intra) and between (inter) four articulators (tongue tip, tongue body, lower lip, and jaw) at three linguistically relevant, hierarchically nested timescales corresponding to stress, syllable, and onset-rime/phoneme. For Aim 1, the features for the habitual rate condition were factorized and correlated with two functional speech outcomes-speech intelligibility and intelligible speaking rate. For Aims 2 and 3, the features were compared between groups and rate conditions, respectiely. RESULTS For Aim 1, the modulation features combined were moderately to strongly correlated with intelligibility (R 2 = .51-.53) and intelligible speaking rate (R 2 = .63-.73). For Aim 2, intra-articulator modulation was impaired in ALS, manifested by moderate-to-large decreases in modulation depth at all timescales and cross-timescale phase synchronization. Interarticulator modulation was relatively unaffected. For Aim 3, voluntary rate reduction improved several intra-articulator modulation features identified as being susceptible to the disease effect in individuals with ALS. CONCLUSIONS Disrupted temporal patterning of articulation, presumably reflecting impaired articulatory entrainment to linguistic rhythms, may contribute to functional speech declines in ALS. These impairments tend to be improved through voluntary rate reduction, possibly by reshaping the temporal template of motor plans to better accommodate the disease-related neuromechanical constraints in the articulatory system. These findings shed light on a novel perspective toward global timing-based motor speech assessment and rehabilitation.
Collapse
Affiliation(s)
- Panying Rong
- Department of Speech-Language-Hearing: Sciences & Disorders, The University of Kansas, Lawrence
| | - Lindsey Heidrick
- Department of Hearing and Speech, The University of Kansas Medical Center, Kansas City
| |
Collapse
|
44
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
45
|
Keshavarzi M, Mandke K, Macfarlane A, Parvez L, Gabrielczyk F, Wilson A, Flanagan S, Goswami U. Decoding of speech information using EEG in children with dyslexia: Less accurate low-frequency representations of speech, not "Noisy" representations. BRAIN AND LANGUAGE 2022; 235:105198. [PMID: 36343509 DOI: 10.1016/j.bandl.2022.105198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
The amplitude envelope of speech carries crucial low-frequency acoustic information that assists linguistic decoding. The sensory-neural Temporal Sampling (TS) theory of developmental dyslexia proposes atypical encoding of speech envelope information < 10 Hz, leading to atypical phonological representations. Here a backward linear TRF model and story listening were employed to estimate the speech information encoded in the electroencephalogram in the canonical delta, theta and alpha bands by 9-year-old children with and without dyslexia. TRF decoding accuracy provided an estimate of how faithfully the children's brains encoded low-frequency envelope information. Between-group analyses showed that the children with dyslexia exhibited impaired reconstruction of speech information in the delta band. However, when the quality of speech encoding for each child was estimated using child-by-child decoding models, then the dyslexic children did not differ from controls. This suggests that children with dyslexia encode neither "noisy" nor "normal" representations of the speech signal, but different representations.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom.
| | - Kanad Mandke
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Annabel Macfarlane
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Lyla Parvez
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Fiona Gabrielczyk
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Angela Wilson
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Sheila Flanagan
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| |
Collapse
|
46
|
Neurodevelopmental oscillatory basis of speech processing in noise. Dev Cogn Neurosci 2022; 59:101181. [PMID: 36549148 PMCID: PMC9792357 DOI: 10.1016/j.dcn.2022.101181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/31/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022] Open
Abstract
Humans' extraordinary ability to understand speech in noise relies on multiple processes that develop with age. Using magnetoencephalography (MEG), we characterize the underlying neuromaturational basis by quantifying how cortical oscillations in 144 participants (aged 5-27 years) track phrasal and syllabic structures in connected speech mixed with different types of noise. While the extraction of prosodic cues from clear speech was stable during development, its maintenance in a multi-talker background matured rapidly up to age 9 and was associated with speech comprehension. Furthermore, while the extraction of subtler information provided by syllables matured at age 9, its maintenance in noisy backgrounds progressively matured until adulthood. Altogether, these results highlight distinct behaviorally relevant maturational trajectories for the neuronal signatures of speech perception. In accordance with grain-size proposals, neuromaturational milestones are reached increasingly late for linguistic units of decreasing size, with further delays incurred by noise.
Collapse
|
47
|
Cross-modal attentional effects of rhythmic sensory stimulation. Atten Percept Psychophys 2022; 85:863-878. [PMID: 36385670 PMCID: PMC10066103 DOI: 10.3758/s13414-022-02611-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2022] [Indexed: 11/18/2022]
Abstract
AbstractTemporal regularities are ubiquitous in our environment. The theory of entrainment posits that the brain can utilize these regularities by synchronizing neural activity with external events, thereby, aligning moments of high neural excitability with expected upcoming stimuli and facilitating perception. Despite numerous accounts reporting entrainment of behavioural and electrophysiological measures, evidence regarding this phenomenon remains mixed, with several recent studies having failed to provide confirmatory evidence. Notably, it is currently unclear whether and for how long the effects of entrainment can persist beyond their initiating stimulus, and whether they remain restricted to the stimulated sensory modality or can cross over to other modalities. Here, we set out to answer these questions by presenting participants with either visual or auditory rhythmic sensory stimulation, followed by a visual or auditory target at six possible time points, either in-phase or out-of-phase relative to the initial stimulus train. Unexpectedly, but in line with several recent studies, we observed no evidence for cyclic fluctuations in performance, despite our design being highly similar to those used in previous demonstrations of sensory entrainment. However, our data revealed a temporally less specific attentional effect, via cross-modally facilitated performance following auditory compared with visual rhythmic stimulation. In addition to a potentially higher salience of auditory rhythms, this could indicate an effect on oscillatory 3-Hz amplitude, resulting in facilitated cognitive control and attention. In summary, our study further challenges the generality of periodic behavioural modulation associated with sensory entrainment, while demonstrating a modality-independent attention effect following auditory rhythmic stimulation.
Collapse
|
48
|
Broderick MP, Zuk NJ, Anderson AJ, Lalor EC. More than words: Neurophysiological correlates of semantic dissimilarity depend on comprehension of the speech narrative. Eur J Neurosci 2022; 56:5201-5214. [PMID: 35993240 DOI: 10.1111/ejn.15805] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 08/15/2022] [Accepted: 08/18/2022] [Indexed: 12/14/2022]
Abstract
Speech comprehension relies on the ability to understand words within a coherent context. Recent studies have attempted to obtain electrophysiological indices of this process by modelling how brain activity is affected by a word's semantic dissimilarity to preceding words. Although the resulting indices appear robust and are strongly modulated by attention, it remains possible that, rather than capturing the contextual understanding of words, they may actually reflect word-to-word changes in semantic content without the need for a narrative-level understanding on the part of the listener. To test this, we recorded electroencephalography from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This manipulation affected the ability of subjects to comprehend the speech narrative but not the ability to recognise individual words. Neural indices of semantic understanding and low-level acoustic processing were derived for each scrambling condition using the temporal response function. Signatures of semantic processing were observed when speech was unscrambled or minimally scrambled and subjects understood the speech. The same markers were absent for higher scrambling levels as speech comprehension dropped. In contrast, word recognition remained high and neural measures related to envelope tracking did not vary significantly across scrambling conditions. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener's understanding of those words relative to that context. It also highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Nathaniel J Zuk
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Andrew J Anderson
- Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| |
Collapse
|
49
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
50
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|