1
|
Kries J, De Clercq P, Gillis M, Vanthornhout J, Lemmens R, Francart T, Vandermosten M. Exploring neural tracking of acoustic and linguistic speech representations in individuals with post-stroke aphasia. Hum Brain Mapp 2024; 45:e26676. [PMID: 38798131 DOI: 10.1002/hbm.26676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/04/2024] [Accepted: 03/21/2024] [Indexed: 05/29/2024] Open
Abstract
Aphasia is a communication disorder that affects processing of language at different levels (e.g., acoustic, phonological, semantic). Recording brain activity via Electroencephalography while people listen to a continuous story allows to analyze brain responses to acoustic and linguistic properties of speech. When the neural activity aligns with these speech properties, it is referred to as neural tracking. Even though measuring neural tracking of speech may present an interesting approach to studying aphasia in an ecologically valid way, it has not yet been investigated in individuals with stroke-induced aphasia. Here, we explored processing of acoustic and linguistic speech representations in individuals with aphasia in the chronic phase after stroke and age-matched healthy controls. We found decreased neural tracking of acoustic speech representations (envelope and envelope onsets) in individuals with aphasia. In addition, word surprisal displayed decreased amplitudes in individuals with aphasia around 195 ms over frontal electrodes, although this effect was not corrected for multiple comparisons. These results show that there is potential to capture language processing impairments in individuals with aphasia by measuring neural tracking of continuous speech. However, more research is needed to validate these results. Nonetheless, this exploratory study shows that neural tracking of naturalistic, continuous speech presents a powerful approach to studying aphasia.
Collapse
Affiliation(s)
- Jill Kries
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
- Department of Psychology, Stanford University, Stanford, California, USA
| | - Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Robin Lemmens
- Experimental Neurology, Department of Neurosciences, KU Leuven, Leuven, Belgium
- Laboratory of Neurobiology, VIB-KU Leuven Center for Brain and Disease Research, Leuven, Belgium
- Department of Neurology, University Hospitals Leuven, Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
2
|
Gehmacher Q, Schubert J, Schmidt F, Hartmann T, Reisinger P, Rösch S, Schwarz K, Popov T, Chait M, Weisz N. Eye movements track prioritized auditory features in selective attention to natural speech. Nat Commun 2024; 15:3692. [PMID: 38693186 PMCID: PMC11063150 DOI: 10.1038/s41467-024-48126-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/22/2024] [Indexed: 05/03/2024] Open
Abstract
Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention. Strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech. Combining simultaneously recorded eye tracking and magnetoencephalographic data with temporal response functions, we show that gaze tracks attended speech, a phenomenon we termed ocular speech tracking. Ocular speech tracking even differentiates a target from a distractor in a multi-speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition.
Collapse
Affiliation(s)
- Quirin Gehmacher
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria.
| | - Juliane Schubert
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Fabian Schmidt
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Thomas Hartmann
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Patrick Reisinger
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University Salzburg, 5020, Salzburg, Austria
| | | | - Tzvetan Popov
- Methods of Plasticity Research, Department of Psychology, University of Zurich, CH-8050, Zurich, Switzerland
- Department of Psychology, University of Konstanz, DE- 78464, Konstanz, Germany
| | - Maria Chait
- Ear Institute, University College London, London, UK
| | - Nathan Weisz
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
3
|
Asaadi AH, Amiri SH, Bosaghzadeh A, Ebrahimpour R. Effects and prediction of cognitive load on encoding model of brain response to auditory and linguistic stimuli in educational multimedia. Sci Rep 2024; 14:9133. [PMID: 38644370 PMCID: PMC11033259 DOI: 10.1038/s41598-024-59411-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/10/2024] [Indexed: 04/23/2024] Open
Abstract
Multimedia is extensively used for educational purposes. However, certain types of multimedia lack proper design, which could impose a cognitive load on the user. Therefore, it is essential to predict cognitive load and understand how it impairs brain functioning. Participants watched a version of educational multimedia that applied Mayer's principles, followed by a version that did not. Meanwhile, their electroencephalography (EEG) was recorded. Subsequently, they participated in a post-test and completed a self-reported cognitive load questionnaire. The audio envelope and word frequency were extracted from the multimedia, and the temporal response functions (TRFs) were obtained using a linear encoding model. We observed that the behavioral data are different between the two groups and the TRFs of the two multimedia versions were different. We saw changes in the amplitude and latencies of both early and late components. In addition, correlations were found between behavioral data and the amplitude and latencies of TRF components. Cognitive load decreased participants' attention to the multimedia, and semantic processing of words also occurred with a delay and smaller amplitude. Hence, encoding models provide insights into the temporal and spatial mapping of the cognitive load activity, which could help us detect and reduce cognitive load in potential environments such as educational multimedia or simulators for different purposes.
Collapse
Affiliation(s)
- Amir Hosein Asaadi
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
- Institute for Research in Fundamental Sciences (IPM), School of Cognitive Sciences, Tehran, Iran
| | - S Hamid Amiri
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Alireza Bosaghzadeh
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Reza Ebrahimpour
- Center for Cognitive Science, Institute for Convergence Science and Technology (ICST), Sharif University of Technology, P.O. Box:14588-89694, Tehran, Iran.
| |
Collapse
|
4
|
Kulasingham JP, Bachmann FL, Eskelund K, Enqvist M, Innes-Brown H, Alickovic E. Predictors for estimating subcortical EEG responses to continuous speech. PLoS One 2024; 19:e0297826. [PMID: 38330068 PMCID: PMC10852227 DOI: 10.1371/journal.pone.0297826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 01/12/2024] [Indexed: 02/10/2024] Open
Abstract
Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology.
Collapse
Affiliation(s)
- Joshua P. Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
- Eriksholm Research Centre, Snekkersten, Denmark
| |
Collapse
|
5
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
6
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
7
|
van der Burght CL, Friederici AD, Maran M, Papitto G, Pyatigorskaya E, Schroën JAM, Trettenbrein PC, Zaccarella E. Cleaning up the Brickyard: How Theory and Methodology Shape Experiments in Cognitive Neuroscience of Language. J Cogn Neurosci 2023; 35:2067-2088. [PMID: 37713672 DOI: 10.1162/jocn_a_02058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
The capacity for language is a defining property of our species, yet despite decades of research, evidence on its neural basis is still mixed and a generalized consensus is difficult to achieve. We suggest that this is partly caused by researchers defining "language" in different ways, with focus on a wide range of phenomena, properties, and levels of investigation. Accordingly, there is very little agreement among cognitive neuroscientists of language on the operationalization of fundamental concepts to be investigated in neuroscientific experiments. Here, we review chains of derivation in the cognitive neuroscience of language, focusing on how the hypothesis under consideration is defined by a combination of theoretical and methodological assumptions. We first attempt to disentangle the complex relationship between linguistics, psychology, and neuroscience in the field. Next, we focus on how conclusions that can be drawn from any experiment are inherently constrained by auxiliary assumptions, both theoretical and methodological, on which the validity of conclusions drawn rests. These issues are discussed in the context of classical experimental manipulations as well as study designs that employ novel approaches such as naturalistic stimuli and computational modeling. We conclude by proposing that a highly interdisciplinary field such as the cognitive neuroscience of language requires researchers to form explicit statements concerning the theoretical definitions, methodological choices, and other constraining factors involved in their work.
Collapse
Affiliation(s)
| | - Angela D Friederici
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Matteo Maran
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Giorgio Papitto
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Elena Pyatigorskaya
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Joëlle A M Schroën
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Patrick C Trettenbrein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
- University of Göttingen, Göttingen, Germany
| | - Emiliano Zaccarella
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
8
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
9
|
Miceli G, Caccia A. The Auditory Agnosias: a Short Review of Neurofunctional Evidence. Curr Neurol Neurosci Rep 2023; 23:671-679. [PMID: 37747655 PMCID: PMC10673750 DOI: 10.1007/s11910-023-01302-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2023] [Indexed: 09/26/2023]
Abstract
PURPOSE OF REVIEW To investigate the neurofunctional correlates of pure auditory agnosia and its varieties (global, verbal, and nonverbal), based on 116 anatomoclinical reports published between 1893 and 2022, with emphasis on hemispheric lateralization, intrahemispheric lesion site, underlying cognitive impairments. RECENT FINDINGS Pure auditory agnosia is rare, and observations accumulate slowly. Recent patient reports and neuroimaging studies on neurotypical subjects offer insights into the putative mechanisms underlying auditory agnosia, while challenging traditional accounts. Global auditory agnosia frequently results from bilateral temporal damage. Verbal auditory agnosia strictly correlates with language-dominant hemisphere lesions. Damage involves the auditory pathways, but the critical lesion site is unclear. Both the auditory cortex and associative areas are reasonable candidates, but cases resulting from brainstem damage are on record. The hemispheric correlates of nonverbal auditory input disorders are less clear. They correlate with unilateral damage to either hemisphere, but evidence is scarce. Based on published cases, pure auditory agnosias are neurologically and functionally heterogeneous. Phenotypes are influenced by co-occurring cognitive impairments. Future studies should start from these facts and integrate patient data and studies in neurotypical individuals.
Collapse
Affiliation(s)
- Gabriele Miceli
- Professor of Neurology, Center for Mind/Brain Studies, University of Trento, Trento, Italy.
| | | |
Collapse
|
10
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
11
|
Slaats S, Weissbart H, Schoffelen JM, Meyer AS, Martin AE. Delta-Band Neural Responses to Individual Words Are Modulated by Sentence Processing. J Neurosci 2023; 43:4867-4883. [PMID: 37221093 PMCID: PMC10312058 DOI: 10.1523/jneurosci.0964-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 04/17/2023] [Accepted: 04/27/2023] [Indexed: 05/25/2023] Open
Abstract
To understand language, we need to recognize words and combine them into phrases and sentences. During this process, responses to the words themselves are changed. In a step toward understanding how the brain builds sentence structure, the present study concerns the neural readout of this adaptation. We ask whether low-frequency neural readouts associated with words change as a function of being in a sentence. To this end, we analyzed an MEG dataset by Schoffelen et al. (2019) of 102 human participants (51 women) listening to sentences and word lists, the latter lacking any syntactic structure and combinatorial meaning. Using temporal response functions and a cumulative model-fitting approach, we disentangled delta- and theta-band responses to lexical information (word frequency), from responses to sensory and distributional variables. The results suggest that delta-band responses to words are affected by sentence context in time and space, over and above entropy and surprisal. In both conditions, the word frequency response spanned left temporal and posterior frontal areas; however, the response appeared later in word lists than in sentences. In addition, sentence context determined whether inferior frontal areas were responsive to lexical information. In the theta band, the amplitude was larger in the word list condition ∼100 milliseconds in right frontal areas. We conclude that low-frequency responses to words are changed by sentential context. The results of this study show how the neural representation of words is affected by structural context and as such provide insight into how the brain instantiates compositionality in language.SIGNIFICANCE STATEMENT Human language is unprecedented in its combinatorial capacity: we are capable of producing and understanding sentences we have never heard before. Although the mechanisms underlying this capacity have been described in formal linguistics and cognitive science, how they are implemented in the brain remains to a large extent unknown. A large body of earlier work from the cognitive neuroscientific literature implies a role for delta-band neural activity in the representation of linguistic structure and meaning. In this work, we combine these insights and techniques with findings from psycholinguistics to show that meaning is more than the sum of its parts; the delta-band MEG signal differentially reflects lexical information inside and outside sentence structures.
Collapse
Affiliation(s)
- Sophie Slaats
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
- The International Max Planck Research School for Language Sciences, 6525 XD Nijmegen, The Netherlands
| | - Hugo Weissbart
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 EN Nijmegen, The Netherlands
| | - Jan-Mathijs Schoffelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 EN Nijmegen, The Netherlands
| | - Antje S Meyer
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 EN Nijmegen, The Netherlands
| | - Andrea E Martin
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
12
|
Zioga I, Weissbart H, Lewis AG, Haegens S, Martin AE. Naturalistic Spoken Language Comprehension Is Supported by Alpha and Beta Oscillations. J Neurosci 2023; 43:3718-3732. [PMID: 37059462 PMCID: PMC10198453 DOI: 10.1523/jneurosci.1500-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 03/17/2023] [Accepted: 03/23/2023] [Indexed: 04/16/2023] Open
Abstract
Brain oscillations are prevalent in all species and are involved in numerous perceptual operations. α oscillations are thought to facilitate processing through the inhibition of task-irrelevant networks, while β oscillations are linked to the putative reactivation of content representations. Can the proposed functional role of α and β oscillations be generalized from low-level operations to higher-level cognitive processes? Here we address this question focusing on naturalistic spoken language comprehension. Twenty-two (18 female) Dutch native speakers listened to stories in Dutch and French while MEG was recorded. We used dependency parsing to identify three dependency states at each word: the number of (1) newly opened dependencies, (2) dependencies that remained open, and (3) resolved dependencies. We then constructed forward models to predict α and β power from the dependency features. Results showed that dependency features predict α and β power in language-related regions beyond low-level linguistic features. Left temporal, fundamental language regions are involved in language comprehension in α, while frontal and parietal, higher-order language regions, and motor regions are involved in β. Critically, α- and β-band dynamics seem to subserve language comprehension tapping into syntactic structure building and semantic composition by providing low-level mechanistic operations for inhibition and reactivation processes. Because of the temporal similarity of the α-β responses, their potential functional dissociation remains to be elucidated. Overall, this study sheds light on the role of α and β oscillations during naturalistic spoken language comprehension, providing evidence for the generalizability of these dynamics from perceptual to complex linguistic processes.SIGNIFICANCE STATEMENT It remains unclear whether the proposed functional role of α and β oscillations in perceptual and motor function is generalizable to higher-level cognitive processes, such as spoken language comprehension. We found that syntactic features predict α and β power in language-related regions beyond low-level linguistic features when listening to naturalistic speech in a known language. We offer experimental findings that integrate a neuroscientific framework on the role of brain oscillations as "building blocks" with spoken language comprehension. This supports the view of a domain-general role of oscillations across the hierarchy of cognitive functions, from low-level sensory operations to abstract linguistic processes.
Collapse
Affiliation(s)
- Ioanna Zioga
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Hugo Weissbart
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
| | - Ashley G Lewis
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Saskia Haegens
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Department of Psychiatry, Columbia University, New York, New York 10032
- Division of Systems Neuroscience, New York State Psychiatric Institute, New York, New York 10032
| | - Andrea E Martin
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| |
Collapse
|
13
|
Kaufman M, Zion Golumbic E. Listening to two speakers: Capacity and tradeoffs in neural speech tracking during Selective and Distributed Attention. Neuroimage 2023; 270:119984. [PMID: 36854352 DOI: 10.1016/j.neuroimage.2023.119984] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 02/06/2023] [Accepted: 02/24/2023] [Indexed: 02/27/2023] Open
Abstract
Speech comprehension is severely compromised when several people talk at once, due to limited perceptual and cognitive resources. In such circumstances, top-down attention mechanisms can actively prioritize processing of task-relevant speech. However, behavioral and neural evidence suggest that this selection is not exclusive, and the system may have sufficient capacity to process additional speech input as well. Here we used a data-driven approach to contrast two opposing hypotheses regarding the system's capacity to co-represent competing speech: Can the brain represent two speakers equally or is the system fundamentally limited, resulting in tradeoffs between them? Neural activity was measured using magnetoencephalography (MEG) as human participants heard concurrent speech narratives and engaged in two tasks: Selective Attention, where only one speaker was task-relevant and Distributed Attention, where both speakers were equally relevant. Analysis of neural speech-tracking revealed that both tasks engaged a similar network of brain regions involved in auditory processing, attentional control and speech processing. Interestingly, during both Selective and Distributed Attention the neural representation of competing speech showed a bias towards one speaker. This is in line with proposed 'bottlenecks' for co-representation of concurrent speech and suggests that good performance on distributed attention tasks may be achieved by toggling attention between speakers over time.
Collapse
Affiliation(s)
- Maya Kaufman
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel.
| |
Collapse
|
14
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
15
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
16
|
Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Trans Biomed Eng 2023; 70:88-96. [PMID: 35727788 PMCID: PMC9946293 DOI: 10.1109/tbme.2022.3185005] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE The Temporal Response Function (TRF) is a linear model of neural activity time-locked to continuous stimuli, including continuous speech. TRFs based on speech envelopes typically have distinct components that have provided remarkable insights into the cortical processing of speech. However, current methods may lead to less than reliable estimates of single-subject TRF components. Here, we compare two established methods, in TRF component estimation, and also propose novel algorithms that utilize prior knowledge of these components, bypassing the full TRF estimation. METHODS We compared two established algorithms, ridge and boosting, and two novel algorithms based on Subspace Pursuit (SP) and Expectation Maximization (EM), which directly estimate TRF components given plausible assumptions regarding component characteristics. Single-channel, multi-channel, and source-localized TRFs were fit on simulations and real magnetoencephalographic data. Performance metrics included model fit and component estimation accuracy. RESULTS Boosting and ridge have comparable performance in component estimation. The novel algorithms outperformed the others in simulations, but not on real data, possibly due to the plausible assumptions not actually being met. Ridge had slightly better model fits on real data compared to boosting, but also more spurious TRF activity. CONCLUSION Results indicate that both smooth (ridge) and sparse (boosting) algorithms perform comparably at TRF component estimation. The SP and EM algorithms may be accurate, but rely on assumptions of component characteristics. SIGNIFICANCE This systematic comparison establishes the suitability of widely used and novel algorithms for estimating robust TRF components, which is essential for improved subject-specific investigations into the cortical processing of speech.
Collapse
|
17
|
Luo C, Gao Y, Fan J, Liu Y, Yu Y, Zhang X. Compromised word-level neural tracking in the high-gamma band for children with attention deficit hyperactivity disorder. Front Hum Neurosci 2023; 17:1174720. [PMID: 37213926 PMCID: PMC10196181 DOI: 10.3389/fnhum.2023.1174720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/18/2023] [Indexed: 05/23/2023] Open
Abstract
Children with attention deficit hyperactivity disorder (ADHD) exhibit pervasive difficulties in speech perception. Given that speech processing involves both acoustic and linguistic stages, it remains unclear which stage of speech processing is impaired in children with ADHD. To investigate this issue, we measured neural tracking of speech at syllable and word levels using electroencephalography (EEG), and evaluated the relationship between neural responses and ADHD symptoms in 6-8 years old children. Twenty-three children participated in the current study, and their ADHD symptoms were assessed with SNAP-IV questionnaires. In the experiment, the children listened to hierarchical speech sequences in which syllables and words were, respectively, repeated at 2.5 and 1.25 Hz. Using frequency domain analyses, reliable neural tracking of syllables and words was observed in both the low-frequency band (<4 Hz) and the high-gamma band (70-160 Hz). However, the neural tracking of words in the high-gamma band showed an anti-correlation with the ADHD symptom scores of the children. These results indicate that ADHD prominently impairs cortical encoding of linguistic information (e.g., words) in speech perception.
Collapse
Affiliation(s)
- Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, China
- Cheng Luo,
| | - Yayue Gao
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
- *Correspondence: Yayue Gao,
| | - Jianing Fan
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yang Liu
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yonglin Yu
- Department of Rehabilitation, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Yonglin Yu,
| | - Xin Zhang
- Department of Neurology, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Xin Zhang,
| |
Collapse
|
18
|
Garibyan A, Schilling A, Boehm C, Zankl A, Krauss P. Neural correlates of linguistic collocations during continuous speech perception. Front Psychol 2022; 13:1076339. [PMID: 36619132 PMCID: PMC9822706 DOI: 10.3389/fpsyg.2022.1076339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/02/2022] [Indexed: 12/25/2022] Open
Abstract
Language is fundamentally predictable, both on a higher schematic level as well as low-level lexical items. Regarding predictability on a lexical level, collocations are frequent co-occurrences of words that are often characterized by high strength of association. So far, psycho- and neurolinguistic studies have mostly employed highly artificial experimental paradigms in the investigation of collocations by focusing on the processing of single words or isolated sentences. In contrast, here we analyze EEG brain responses recorded during stimulation with continuous speech, i.e., audio books. We find that the N400 response to collocations is significantly different from that of non-collocations, whereas the effect varies with respect to cortical region (anterior/posterior) and laterality (left/right). Our results are in line with studies using continuous speech, and they mostly contradict those using artificial paradigms and stimuli. To the best of our knowledge, this is the first neurolinguistic study on collocations using continuous speech stimulation.
Collapse
Affiliation(s)
- Armine Garibyan
- Chair of English Philology and Linguistics, University Erlangen-Nuremberg, Erlangen, Germany,Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany
| | - Achim Schilling
- Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Claudia Boehm
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Alexandra Zankl
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Patrick Krauss
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany,Pattern Recognition Lab, University Erlangen-Nuremberg, Erlangen, Germany,*Correspondence: Patrick Krauss,
| |
Collapse
|
19
|
Youssofzadeh V, Conant L, Stout J, Ustine C, Humphries C, Gross WL, Shah-Basak P, Mathis J, Awe E, Allen L, DeYoe EA, Carlson C, Anderson CT, Maganti R, Hermann B, Nair VA, Prabhakaran V, Meyerand B, Binder JR, Raghavan M. Late dominance of the right hemisphere during narrative comprehension. Neuroimage 2022; 264:119749. [PMID: 36379420 PMCID: PMC9772156 DOI: 10.1016/j.neuroimage.2022.119749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 10/12/2022] [Accepted: 11/11/2022] [Indexed: 11/15/2022] Open
Abstract
PET and fMRI studies suggest that auditory narrative comprehension is supported by a bilateral multilobar cortical network. The superior temporal resolution of magnetoencephalography (MEG) makes it an attractive tool to investigate the dynamics of how different neuroanatomic substrates engage during narrative comprehension. Using beta-band power changes as a marker of cortical engagement, we studied MEG responses during an auditory story comprehension task in 31 healthy adults. The protocol consisted of two runs, each interleaving 7 blocks of the story comprehension task with 15 blocks of an auditorily presented math task as a control for phonological processing, working memory, and attention processes. Sources at the cortical surface were estimated with a frequency-resolved beamformer. Beta-band power was estimated in the frequency range of 16-24 Hz over 1-sec epochs starting from 400 msec after stimulus onset until the end of a story or math problem presentation. These power estimates were compared to 1-second epochs of data before the stimulus block onset. The task-related cortical engagement was inferred from beta-band power decrements. Group-level source activations were statistically compared using non-parametric permutation testing. A story-math contrast of beta-band power changes showed greater bilateral cortical engagement within the fusiform gyrus, inferior and middle temporal gyri, parahippocampal gyrus, and left inferior frontal gyrus (IFG) during story comprehension. A math-story contrast of beta power decrements showed greater bilateral but left-lateralized engagement of the middle frontal gyrus and superior parietal lobule. The evolution of cortical engagement during five temporal windows across the presentation of stories showed significant involvement during the first interval of the narrative of bilateral opercular and insular regions as well as the ventral and lateral temporal cortex, extending more posteriorly on the left and medially on the right. Over time, there continued to be sustained right anterior ventral temporal engagement, with increasing involvement of the right anterior parahippocampal gyrus, STG, MTG, posterior superior temporal sulcus, inferior parietal lobule, frontal operculum, and insula, while left hemisphere engagement decreased. Our findings are consistent with prior imaging studies of narrative comprehension, but in addition, they demonstrate increasing right-lateralized engagement over the course of narratives, suggesting an important role for these right-hemispheric regions in semantic integration as well as social and pragmatic inference processing.
Collapse
Affiliation(s)
- Vahab Youssofzadeh
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Corresponding author. (V. Youssofzadeh)
| | - Lisa Conant
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jeffrey Stout
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Candida Ustine
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - William L. Gross
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Anesthesiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Jed Mathis
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Elizabeth Awe
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Linda Allen
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Edgar A. DeYoe
- Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chad Carlson
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Rama Maganti
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Bruce Hermann
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Veena A. Nair
- Radiology, University of Wisconsin-Madison, Madison, WI, USA
| | - Vivek Prabhakaran
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Psychiatry, University of Wisconsin-Madison, Madison, WI, USA
| | - Beth Meyerand
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Manoj Raghavan
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
20
|
McMurray B, Sarrett ME, Chiu S, Black AK, Wang A, Canale R, Aslin RN. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 2022; 260:119457. [PMID: 35842096 PMCID: PMC10875705 DOI: 10.1016/j.neuroimage.2022.119457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 07/02/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
The efficiency of spoken word recognition is essential for real-time communication. There is consensus that this efficiency relies on an implicit process of activating multiple word candidates that compete for recognition as the acoustic signal unfolds in real-time. However, few methods capture the neural basis of this dynamic competition on a msec-by-msec basis. This is crucial for understanding the neuroscience of language, and for understanding hearing, language and cognitive disorders in people for whom current behavioral methods are not suitable. We applied machine-learning techniques to standard EEG signals to decode which word was heard on each trial and analyzed the patterns of confusion over time. Results mirrored psycholinguistic findings: Early on, the decoder was equally likely to report the target (e.g., baggage) or a similar sounding competitor (badger), but by around 500 msec, competitors were suppressed. Follow up analyses show that this is robust across EEG systems (gel and saline), with fewer channels, and with fewer trials. Results are robust within individuals and show high reliability. This suggests a powerful and simple paradigm that can assess the neural dynamics of speech decoding, with potential applications for understanding lexical development in a variety of clinical disorders.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics and Dept. of Otolaryngology, University of Iowa.
| | - McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, Unviersity of Iowa
| | - Samantha Chiu
- Dept. of Psychological and Brain Sciences, University of Iowa
| | - Alexis K Black
- School of Audiology and Speech Sciences, University of British Columbia, Haskins Laboratories
| | - Alice Wang
- Dept. of Psychology, University of Oregon, Haskins Laboratories
| | - Rebecca Canale
- Dept. of Psychological Sciences, University of Connecticut, Haskins Laboratories
| | - Richard N Aslin
- Haskins Laboratories, Department of Psychology and Child Study Center, Yale University, Department of Psychology, University of Connecticut
| |
Collapse
|
21
|
Patel S, Sharma D, Uniyal A, Gadepalli A, Tiwari V. Recent advancements in biomarker research in schizophrenia: mapping the road from bench to bedside. Metab Brain Dis 2022; 37:2197-2211. [PMID: 35239143 DOI: 10.1007/s11011-022-00926-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 02/04/2022] [Indexed: 10/19/2022]
Abstract
Schizophrenia (SZ) is a severe progressive neurodegenerative as well as disruptive behavior disorder affecting innumerable people throughout the world. The discovery of potential biomarkers in the clinical scenario would lead to the development of effective methods of diagnosis and would provide an understanding of the prognosis of the disease. Moreover, breakthrough inventions for the treatment and prevention of this mysterious disease could evolve as a result of a thorough understanding of the clinical biomarkers. In this review, we have discussed about specific biomarkers of SZ an emphasis has been laid to delineate (1) diagnostic biomarkers like neuroimmune biomarkers, metabolic biomarkers, oligodendrocyte biomarkers and biomarkers of negative and cognitive symptoms, (2) therapeutic biomarkers like various neurotransmitter systems and (3) prognostic biomarkers. All the biomarkers were evaluated in drug-naïve (at least for 4 weeks) patients in order to achieve a clear comparison between schizophrenic patients and healthy controls. Also, an attempt has been made to elucidate the potential genes which serve as predictors and tools for the determination of biomarkers and would ultimately help in the prevention and treatment of this deadly illness.
Collapse
Affiliation(s)
- Shivangi Patel
- Department of Pharmacology, Bombay College of Pharmacy, 400098, Mumbai, India
| | - Dilip Sharma
- Rutgers New Jersey Medical School, 07103, Newark, NJ, United States
| | - Ankit Uniyal
- Department of Pharmaceutical Engineering, Indian Institute of Technology (Banaras Hindu University), 221005, Varanasi, U.P, India
| | - Anagha Gadepalli
- Department of Pharmaceutical Engineering, Indian Institute of Technology (Banaras Hindu University), 221005, Varanasi, U.P, India
| | - Vinod Tiwari
- Department of Pharmaceutical Engineering, Indian Institute of Technology (Banaras Hindu University), 221005, Varanasi, U.P, India.
| |
Collapse
|
22
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
23
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
24
|
Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front Neurosci 2022; 16:915744. [PMID: 35942153 PMCID: PMC9355803 DOI: 10.3389/fnins.2022.915744] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/04/2022] [Indexed: 11/21/2022] Open
Abstract
Spoken language comprehension requires rapid and continuous integration of information, from lower-level acoustic to higher-level linguistic features. Much of this processing occurs in the cerebral cortex. Its neural activity exhibits, for instance, correlates of predictive processing, emerging at delays of a few 100 ms. However, the auditory pathways are also characterized by extensive feedback loops from higher-level cortical areas to lower-level ones as well as to subcortical structures. Early neural activity can therefore be influenced by higher-level cognitive processes, but it remains unclear whether such feedback contributes to linguistic processing. Here, we investigated early speech-evoked neural activity that emerges at the fundamental frequency. We analyzed EEG recordings obtained when subjects listened to a story read by a single speaker. We identified a response tracking the speaker's fundamental frequency that occurred at a delay of 11 ms, while another response elicited by the high-frequency modulation of the envelope of higher harmonics exhibited a larger magnitude and longer latency of about 18 ms with an additional significant component at around 40 ms. Notably, while the earlier components of the response likely originate from the subcortical structures, the latter presumably involves contributions from cortical regions. Subsequently, we determined the magnitude of these early neural responses for each individual word in the story. We then quantified the context-independent frequency of each word and used a language model to compute context-dependent word surprisal and precision. The word surprisal represented how predictable a word is, given the previous context, and the word precision reflected the confidence about predicting the next word from the past context. We found that the word-level neural responses at the fundamental frequency were predominantly influenced by the acoustic features: the average fundamental frequency and its variability. Amongst the linguistic features, only context-independent word frequency showed a weak but significant modulation of the neural response to the high-frequency envelope modulation. Our results show that the early neural response at the fundamental frequency is already influenced by acoustic as well as linguistic information, suggesting top-down modulation of this neural response.
Collapse
Affiliation(s)
- Mikolaj Kegler
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Tobias Reichenbach
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
- *Correspondence: Tobias Reichenbach
| |
Collapse
|
25
|
Zhou D, Zhang G, Dang J, Unoki M, Liu X. Detection of Brain Network Communities During Natural Speech Comprehension From Functionally Aligned EEG Sources. Front Comput Neurosci 2022; 16:919215. [PMID: 35874316 PMCID: PMC9301328 DOI: 10.3389/fncom.2022.919215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/14/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, electroencephalograph (EEG) studies on speech comprehension have been extended from a controlled paradigm to a natural paradigm. Under the hypothesis that the brain can be approximated as a linear time-invariant system, the neural response to natural speech has been investigated extensively using temporal response functions (TRFs). However, most studies have modeled TRFs in the electrode space, which is a mixture of brain sources and thus cannot fully reveal the functional mechanism underlying speech comprehension. In this paper, we propose methods for investigating the brain networks of natural speech comprehension using TRFs on the basis of EEG source reconstruction. We first propose a functional hyper-alignment method with an additive average method to reduce EEG noise. Then, we reconstruct neural sources within the brain based on the EEG signals to estimate TRFs from speech stimuli to source areas, and then investigate the brain networks in the neural source space on the basis of the community detection method. To evaluate TRF-based brain networks, EEG data were recorded in story listening tasks with normal speech and time-reversed speech. To obtain reliable structures of brain networks, we detected TRF-based communities from multiple scales. As a result, the proposed functional hyper-alignment method could effectively reduce the noise caused by individual settings in an EEG experiment and thus improve the accuracy of source reconstruction. The detected brain networks for normal speech comprehension were clearly distinctive from those for non-semantically driven (time-reversed speech) audio processing. Our result indicates that the proposed source TRFs can reflect the cognitive processing of spoken language and that the multi-scale community detection method is powerful for investigating brain networks.
Collapse
Affiliation(s)
- Di Zhou
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Gaoyan Zhang
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
- *Correspondence: Gaoyan Zhang
| | - Jianwu Dang
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
- Jianwu Dang
| | - Masashi Unoki
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Xin Liu
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| |
Collapse
|
26
|
Coopmans CW, de Hoop H, Hagoort P, Martin AE. Effects of Structure and Meaning on Cortical Tracking of Linguistic Units in Naturalistic Speech. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:386-412. [PMID: 37216060 PMCID: PMC10158633 DOI: 10.1162/nol_a_00070] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 03/02/2022] [Indexed: 05/24/2023]
Abstract
Recent research has established that cortical activity "tracks" the presentation rate of syntactic phrases in continuous speech, even though phrases are abstract units that do not have direct correlates in the acoustic signal. We investigated whether cortical tracking of phrase structures is modulated by the extent to which these structures compositionally determine meaning. To this end, we recorded electroencephalography (EEG) of 38 native speakers who listened to naturally spoken Dutch stimuli in different conditions, which parametrically modulated the degree to which syntactic structure and lexical semantics determine sentence meaning. Tracking was quantified through mutual information between the EEG data and either the speech envelopes or abstract annotations of syntax, all of which were filtered in the frequency band corresponding to the presentation rate of phrases (1.1-2.1 Hz). Overall, these mutual information analyses showed stronger tracking of phrases in regular sentences than in stimuli whose lexical-syntactic content is reduced, but no consistent differences in tracking between sentences and stimuli that contain a combination of syntactic structure and lexical content. While there were no effects of compositional meaning on the degree of phrase-structure tracking, analyses of event-related potentials elicited by sentence-final words did reveal meaning-induced differences between conditions. Our findings suggest that cortical tracking of structure in sentences indexes the internal generation of this structure, a process that is modulated by the properties of its input, but not by the compositional interpretation of its output.
Collapse
Affiliation(s)
- Cas W. Coopmans
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
| | - Helen de Hoop
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
27
|
Taylor C, Hall S, Manivannan S, Mundil N, Border S. The neuroanatomical consequences and pathological implications of bilingualism. J Anat 2022; 240:410-427. [PMID: 34486112 PMCID: PMC8742975 DOI: 10.1111/joa.13542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 07/26/2021] [Accepted: 08/23/2021] [Indexed: 01/17/2023] Open
Abstract
In recent years, there has been a rise in the number of people who are able to speak two or more languages. This has been paralleled by an increase in research related to bilingualism. Despite this, much of the neuroanatomical consequences and pathological implications of bilingualism are still subject to discussion. This review aims to evaluate the neuroanatomical structures related to language and to the acquisition of a second language as well as exploring how learning a second language can alter one's susceptibility to and the progression of certain cerebral pathologies. A literature search was conducted on the Medline, Embase, and Web of Science databases. A total of 137 articles regarding the neuroanatomical or pathological implications of bilingualism were included for review. Following analysis of the included papers, this review finds that bilingualism induces significant gray and white matter cerebral changes, particularly in the frontal lobes, anterior cingulate cortex, left inferior parietal lobule and subcortical areas, and that native language and acquired language largely recruit the same neuroanatomical structures with however, subtle functional and anatomical differences dependent on proficiency and age of language acquisition. There is adequate evidence to suggest that bilingualism offsets the symptoms and diagnosis of dementia, and that it is protective against both pathological and age-related cognitive decline. While many of the neuroanatomical changes are known, more remains to be elucidated and the relationship between bilingualism and other neurological pathologies remains unclear.
Collapse
Affiliation(s)
- Charles Taylor
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
| | - Samuel Hall
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Susruta Manivannan
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Nilesh Mundil
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Scott Border
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
| |
Collapse
|
28
|
Brodbeck C, Bhattasali S, Cruz Heredia AAL, Resnik P, Simon JZ, Lau E. Parallel processing in speech perception with local and global representations of linguistic context. eLife 2022; 11:72056. [PMID: 35060904 PMCID: PMC8830882 DOI: 10.7554/elife.72056] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 01/16/2022] [Indexed: 12/03/2022] Open
Abstract
Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition.
Collapse
Affiliation(s)
| | | | | | | | | | - Ellen Lau
- Department of Linguistics, University of Maryland
| |
Collapse
|
29
|
Miceli G, Caccia A. Cortical disorders of speech processing: Pure word deafness and auditory agnosia. HANDBOOK OF CLINICAL NEUROLOGY 2022; 187:69-87. [PMID: 35964993 DOI: 10.1016/b978-0-12-823493-8.00005-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Selective disorders of auditory speech processing due to brain lesions are reviewed. Over 120 years after the first anatomic report (Dejerine and Sérieux, 1898), fewer than 80 cumulative cases of generalized auditory agnosia and pure word deafness with documented brain lesions are on record. Most patients (approximately 70%) had vascular lesions. Damage is very frequently bilateral in generalized auditory agnosia, and more frequently unilateral in pure word deafness. In unilateral cases, anatomical disconnection is not a prerequisite, and disorders may be due to functional disconnection. Regardless of whether lesions are unilateral or bilateral, speech processing difficulties emerge in the presence of damage to the superior temporal regions of the language-dominant hemisphere, suggesting that speech input is processed asymmetrically at early stages already. Extant evidence does not allow establishing whether processing asymmetry originates in the primary auditory cortex or in higher associative cortices, nor whether auditory processing in the brainstem is entirely symmetric. Results are consistent with the view that the difficulty in processing auditory input characterized by quick spectral and/or temporal changes is one of the critical dimensions of the disorder. Forthcoming studies should focus on detailed audiologic, neurolinguistic, and neuroanatomic descriptions of each case.
Collapse
Affiliation(s)
- Gabriele Miceli
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy; Centro Interdisciplinare Linceo 'Beniamino Segre'-Accademia dei Lincei, Rome, Italy.
| | - Antea Caccia
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy; Department of Psychology, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
30
|
Lenc T, Merchant H, Keller PE, Honing H, Varlet M, Nozaradan S. Mapping between sound, brain and behaviour: four-level framework for understanding rhythm processing in humans and non-human primates. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200325. [PMID: 34420381 PMCID: PMC8380981 DOI: 10.1098/rstb.2020.0325] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2021] [Indexed: 12/16/2022] Open
Abstract
Humans perceive and spontaneously move to one or several levels of periodic pulses (a meter, for short) when listening to musical rhythm, even when the sensory input does not provide prominent periodic cues to their temporal location. Here, we review a multi-levelled framework to understanding how external rhythmic inputs are mapped onto internally represented metric pulses. This mapping is studied using an approach to quantify and directly compare representations of metric pulses in signals corresponding to sensory inputs, neural activity and behaviour (typically body movement). Based on this approach, recent empirical evidence can be drawn together into a conceptual framework that unpacks the phenomenon of meter into four levels. Each level highlights specific functional processes that critically enable and shape the mapping from sensory input to internal meter. We discuss the nature, constraints and neural substrates of these processes, starting with fundamental mechanisms investigated in macaque monkeys that enable basic forms of mapping between simple rhythmic stimuli and internally represented metric pulse. We propose that human evolution has gradually built a robust and flexible system upon these fundamental processes, allowing more complex levels of mapping to emerge in musical behaviours. This approach opens promising avenues to understand the many facets of rhythmic behaviours across individuals and species. This article is part of the theme issue 'Synchrony and rhythm interaction: from the brain to behavioural ecology'.
Collapse
Affiliation(s)
- Tomas Lenc
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
| | - Hugo Merchant
- Instituto de Neurobiologia, UNAM, Campus Juriquilla, Querétaro 76230, Mexico
| | - Peter E. Keller
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Henkjan Honing
- Amsterdam Brain and Cognition (ABC), Institute for Logic, Language and Computation (ILLC), University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Manuel Varlet
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
- School of Psychology, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Sylvie Nozaradan
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
| |
Collapse
|
31
|
Kiremitçi I, Yilmaz Ö, Çelik E, Shahdloo M, Huth AG, Çukur T. Attentional Modulation of Hierarchical Speech Representations in a Multitalker Environment. Cereb Cortex 2021; 31:4986-5005. [PMID: 34115102 PMCID: PMC8491717 DOI: 10.1093/cercor/bhab136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Humans are remarkably adept in listening to a desired speaker in a crowded environment, while filtering out nontarget speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that attention causes broad modulations at multiple levels of speech representations while growing stronger toward later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multispeaker environments.
Collapse
Affiliation(s)
- Ibrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Özgür Yilmaz
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
| | - Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Mo Shahdloo
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX3 9DU, UK
| | - Alexander G Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| |
Collapse
|
32
|
Hamilton LS, Oganian Y, Hall J, Chang EF. Parallel and distributed encoding of speech across human auditory cortex. Cell 2021; 184:4626-4639.e13. [PMID: 34411517 DOI: 10.1016/j.cell.2021.07.019] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 02/11/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022]
Abstract
Speech perception is thought to rely on a cortical feedforward serial transformation of acoustic into linguistic representations. Using intracranial recordings across the entire human auditory cortex, electrocortical stimulation, and surgical ablation, we show that cortical processing across areas is not consistent with a serial hierarchical organization. Instead, response latency and receptive field analyses demonstrate parallel and distinct information processing in the primary and nonprimary auditory cortices. This functional dissociation was also observed where stimulation of the primary auditory cortex evokes auditory hallucination but does not distort or interfere with speech perception. Opposite effects were observed during stimulation of nonprimary cortex in superior temporal gyrus. Ablation of the primary auditory cortex does not affect speech perception. These results establish a distributed functional organization of parallel information processing throughout the human auditory cortex and demonstrate an essential independent role for nonprimary auditory cortex in speech processing.
Collapse
Affiliation(s)
- Liberty S Hamilton
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Jeffery Hall
- Department of Neurology and Neurosurgery, McGill University Montreal Neurological Institute, Montreal, QC, H3A 2B4, Canada
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
33
|
Viswanathan V, Bharadwaj HM, Shinn-Cunningham BG, Heinz MG. Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2230. [PMID: 34598642 PMCID: PMC8483789 DOI: 10.1121/10.0006385] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/22/2021] [Accepted: 08/30/2021] [Indexed: 05/28/2023]
Abstract
A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | - Hari M Bharadwaj
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
34
|
Verschueren E, Vanthornhout J, Francart T. The Effect of Stimulus Choice on an EEG-Based Objective Measure of Speech Intelligibility. Ear Hear 2021; 41:1586-1597. [PMID: 33136634 DOI: 10.1097/aud.0000000000000875] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
OBJECTIVES Recently, an objective measure of speech intelligibility (SI), based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of SI can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications. DESIGN We recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of SI by adding speech weighted noise. SI was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas. RESULTS For both stimulus types, the correlation between the speech envelope and the reconstructed envelope increased with increasing SI. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing SI for both stimuli. Remarkable is that although SI remained unchanged under the no-noise and +2.5 dB SNR conditions, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 and 150 ms, while amplitudes between 150 and 200 ms increased in the presence of noise. TRF latency changes in function of SI appeared to be stimulus specific: the latency of the prominent negative peak in the early responses (50 to 300 ms) increased with increasing SI for the Matrix sentences, but remained unchanged for the natural story. CONCLUSIONS These results show (1) the feasibility of natural speech as a stimulus for the objective measure of SI; (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences; and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology (ExpORL), Department of Neurosciences, KU Leuven-University of Leuven, Leuven, Belgium
| | | | | |
Collapse
|
35
|
Leahy J, Kim SG, Wan J, Overath T. An Analytical Framework of Tonal and Rhythmic Hierarchy in Natural Music Using the Multivariate Temporal Response Function. Front Neurosci 2021; 15:665767. [PMID: 34335154 PMCID: PMC8322238 DOI: 10.3389/fnins.2021.665767] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Even without formal training, humans experience a wide range of emotions in response to changes in musical features, such as tonality and rhythm, during music listening. While many studies have investigated how isolated elements of tonal and rhythmic properties are processed in the human brain, it remains unclear whether these findings with such controlled stimuli are generalizable to complex stimuli in the real world. In the current study, we present an analytical framework of a linearized encoding analysis based on a set of music information retrieval features to investigate the rapid cortical encoding of tonal and rhythmic hierarchies in natural music. We applied this framework to a public domain EEG dataset (OpenMIIR) to deconvolve overlapping EEG responses to various musical features in continuous music. In particular, the proposed framework investigated the EEG encoding of the following features: tonal stability, key clarity, beat, and meter. This analysis revealed a differential spatiotemporal neural encoding of beat and meter, but not of tonal stability and key clarity. The results demonstrate that this framework can uncover associations of ongoing brain activity with relevant musical features, which could be further extended to other relevant measures such as time-resolved emotional responses in future studies.
Collapse
Affiliation(s)
- Jasmine Leahy
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States
| | - Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States
| | - Jie Wan
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States.,Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, United States
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States.,Duke Institute for Brain Sciences, Duke University, Durham, NC, United States.,Center for Cognitive Neuroscience, Duke University, Durham, NC, United States
| |
Collapse
|
36
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
37
|
Har-shai Yahav P, Zion Golumbic E. Linguistic processing of task-irrelevant speech at a cocktail party. eLife 2021; 10:e65096. [PMID: 33942722 PMCID: PMC8163500 DOI: 10.7554/elife.65096] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 04/26/2021] [Indexed: 01/05/2023] Open
Abstract
Paying attention to one speaker in a noisy place can be extremely difficult, because to-be-attended and task-irrelevant speech compete for processing resources. We tested whether this competition is restricted to acoustic-phonetic interference or if it extends to competition for linguistic processing as well. Neural activity was recorded using Magnetoencephalography as human participants were instructed to attend to natural speech presented to one ear, and task-irrelevant stimuli were presented to the other. Task-irrelevant stimuli consisted either of random sequences of syllables, or syllables structured to form coherent sentences, using hierarchical frequency-tagging. We find that the phrasal structure of structured task-irrelevant stimuli was represented in the neural response in left inferior frontal and posterior parietal regions, indicating that selective attention does not fully eliminate linguistic processing of task-irrelevant speech. Additionally, neural tracking of to-be-attended speech in left inferior frontal regions was enhanced when competing with structured task-irrelevant stimuli, suggesting inherent competition between them for linguistic processing.
Collapse
Affiliation(s)
- Paz Har-shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| |
Collapse
|
38
|
Van Canneyt J, Wouters J, Francart T. Neural tracking of the fundamental frequency of the voice: The effect of voice characteristics. Eur J Neurosci 2021; 53:3640-3653. [DOI: 10.1111/ejn.15229] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/24/2021] [Accepted: 04/08/2021] [Indexed: 11/26/2022]
Affiliation(s)
| | - Jan Wouters
- ExpORL Department of Neurosciences KU Leuven Leuven Belgium
| | - Tom Francart
- ExpORL Department of Neurosciences KU Leuven Leuven Belgium
| |
Collapse
|
39
|
Kuruvila I, Can Demir K, Fischer E, Hoppe U. Inference of the Selective Auditory Attention Using Sequential LMMSE Estimation. IEEE Trans Biomed Eng 2021; 68:3501-3512. [PMID: 33891545 DOI: 10.1109/tbme.2021.3075337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Attentive listening in a multispeaker environment such as a cocktail party requires suppression of the interfering speakers and the noise around. People with normal hearing perform remarkably well in such situations. Analysis of the cortical signals using electroencephalography (EEG) has revealed that the EEG signals track the envelope of the attended speech stronger than that of the interfering speech. This has enabled the development of algorithms that can decode the selective attention of a listener in controlled experimental settings. However, often these algorithms require longer trial duration and computationally expensive calibration to obtain a reliable inference of attention. In this paper, we present a novel framework to decode the attention of a listener within trial durations of the order of two seconds. It comprises of three modules: 1) Dynamic estimation of the temporal response functions (TRF) in every trial using a sequential linear minimum mean squared error (LMMSE) estimator, 2) Extract the N1 -P2 peak of the estimated TRF that serves as a marker related to the attentional state, and 3) Obtain a probabilistic measure of the attentional state using a support vector machine followed by a logistic regression. The efficacy of the proposed decoding framework was evaluated using EEG data collected from 27 subjects. The total number of electrodes required to infer the attention was four: One for the signal estimation, one for the noise estimation and the other two being the reference and the ground electrodes. Our results make further progress towards the realization of neuro-steered hearing aids.
Collapse
|
40
|
MEG Intersubject Phase Locking of Stimulus-Driven Activity during Naturalistic Speech Listening Correlates with Musical Training. J Neurosci 2021; 41:2713-2722. [PMID: 33536196 DOI: 10.1523/jneurosci.0932-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 11/13/2020] [Accepted: 11/17/2020] [Indexed: 12/26/2022] Open
Abstract
Musical training is associated with increased structural and functional connectivity between auditory sensory areas and higher-order brain networks involved in speech and motor processing. Whether such changed connectivity patterns facilitate the cortical propagation of speech information in musicians remains poorly understood. We here used magnetoencephalography (MEG) source imaging and a novel seed-based intersubject phase-locking approach to investigate the effects of musical training on the interregional synchronization of stimulus-driven neural responses during listening to naturalistic continuous speech presented in silence. MEG data were obtained from 20 young human subjects (both sexes) with different degrees of musical training. Our data show robust bilateral patterns of stimulus-driven interregional phase synchronization between auditory cortex and frontotemporal brain regions previously associated with speech processing. Stimulus-driven phase locking was maximal in the delta band, but was also observed in the theta and alpha bands. The individual duration of musical training was positively associated with the magnitude of stimulus-driven alpha-band phase locking between auditory cortex and parts of the dorsal and ventral auditory processing streams. These findings provide evidence for a positive relationship between musical training and the propagation of speech-related information between auditory sensory areas and higher-order processing networks, even when speech is presented in silence. We suggest that the increased synchronization of higher-order cortical regions to auditory cortex may contribute to the previously described musician advantage in processing speech in background noise.SIGNIFICANCE STATEMENT Musical training has been associated with widespread structural and functional brain plasticity. It has been suggested that these changes benefit the production and perception of music but can also translate to other domains of auditory processing, such as speech. We developed a new magnetoencephalography intersubject analysis approach to study the cortical synchronization of stimulus-driven neural responses during the perception of continuous natural speech and its relationship to individual musical training. Our results provide evidence that musical training is associated with higher synchronization of stimulus-driven activity between brain regions involved in early auditory sensory and higher-order processing. We suggest that the increased synchronized propagation of speech information may contribute to the previously described musician advantage in processing speech in background noise.
Collapse
|
41
|
Broderick MP, Di Liberto GM, Anderson AJ, Rofes A, Lalor EC. Dissociable electrophysiological measures of natural language processing reveal differences in speech comprehension strategy in healthy ageing. Sci Rep 2021; 11:4963. [PMID: 33654202 PMCID: PMC7925601 DOI: 10.1038/s41598-021-84597-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 02/08/2021] [Indexed: 12/17/2022] Open
Abstract
Healthy ageing leads to changes in the brain that impact upon sensory and cognitive processing. It is not fully clear how these changes affect the processing of everyday spoken language. Prediction is thought to play an important role in language comprehension, where information about upcoming words is pre-activated across multiple representational levels. However, evidence from electrophysiology suggests differences in how older and younger adults use context-based predictions, particularly at the level of semantic representation. We investigate these differences during natural speech comprehension by presenting older and younger subjects with continuous, narrative speech while recording their electroencephalogram. We use time-lagged linear regression to test how distinct computational measures of (1) semantic dissimilarity and (2) lexical surprisal are processed in the brains of both groups. Our results reveal dissociable neural correlates of these two measures that suggest differences in how younger and older adults successfully comprehend speech. Specifically, our results suggest that, while younger and older subjects both employ context-based lexical predictions, older subjects are significantly less likely to pre-activate the semantic features relating to upcoming words. Furthermore, across our group of older adults, we show that the weaker the neural signature of this semantic pre-activation mechanism, the lower a subject’s semantic verbal fluency score. We interpret these findings as prediction playing a generally reduced role at a semantic level in the brains of older listeners during speech comprehension and that these changes may be part of an overall strategy to successfully comprehend speech with reduced cognitive resources.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'études Cognitives, École Normale Supérieure, PSL University, CNRS, 75005, Paris, France
| | - Andrew J Anderson
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, 14627, USA.,Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, 14627, USA
| | - Adrià Rofes
- Department of Neurolinguistics and Language Development, University of Groningen, Oude Kijk in Het Jatstraat 26, 9712 EK, Groningen, The Netherlands
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, NY, 14627, USA.,Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, 14627, USA
| |
Collapse
|
42
|
Farahani ED, Wouters J, van Wieringen A. Brain mapping of auditory steady-state responses: A broad view of cortical and subcortical sources. Hum Brain Mapp 2021; 42:780-796. [PMID: 33166050 PMCID: PMC7814770 DOI: 10.1002/hbm.25262] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 10/13/2020] [Accepted: 10/15/2020] [Indexed: 12/21/2022] Open
Abstract
Auditory steady-state responses (ASSRs) are evoked brain responses to modulated or repetitive acoustic stimuli. Investigating the underlying neural generators of ASSRs is important to gain in-depth insight into the mechanisms of auditory temporal processing. The aim of this study is to reconstruct an extensive range of neural generators, that is, cortical and subcortical, as well as primary and non-primary ones. This extensive overview of neural generators provides an appropriate basis for studying functional connectivity. To this end, a minimum-norm imaging (MNI) technique is employed. We also present a novel extension to MNI which facilitates source analysis by quantifying the ASSR for each dipole. Results demonstrate that the proposed MNI approach is successful in reconstructing sources located both within (primary) and outside (non-primary) of the auditory cortex (AC). Primary sources are detected in different stimulation conditions (four modulation frequencies and two sides of stimulation), thereby demonstrating the robustness of the approach. This study is one of the first investigations to identify non-primary sources. Moreover, we show that the MNI approach is also capable of reconstructing the subcortical activities of ASSRs. Finally, the results obtained using the MNI approach outperform the group-independent component analysis method on the same data, in terms of detection of sources in the AC, reconstructing the subcortical activities and reducing computational load.
Collapse
Affiliation(s)
- Ehsan Darestani Farahani
- Research Group Experimental ORL, Department of NeurosciencesKatholieke Universiteit LeuvenLeuvenBelgium
| | - Jan Wouters
- Research Group Experimental ORL, Department of NeurosciencesKatholieke Universiteit LeuvenLeuvenBelgium
| | - Astrid van Wieringen
- Research Group Experimental ORL, Department of NeurosciencesKatholieke Universiteit LeuvenLeuvenBelgium
| |
Collapse
|
43
|
Effect of number and placement of EEG electrodes on measurement of neural tracking of speech. PLoS One 2021; 16:e0246769. [PMID: 33571299 PMCID: PMC7877609 DOI: 10.1371/journal.pone.0246769] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open
Abstract
Measurement of neural tracking of natural running speech from the electroencephalogram (EEG) is an increasingly popular method in auditory neuroscience and has applications in audiology. The method involves decoding the envelope of the speech signal from the EEG signal, and calculating the correlation with the envelope of the audio stream that was presented to the subject. Typically EEG systems with 64 or more electrodes are used. However, in practical applications, set-ups with fewer electrodes are required. Here, we determine the optimal number of electrodes, and the best position to place a limited number of electrodes on the scalp. We propose a channel selection strategy based on an utility metric, which allows a quick quantitative assessment of the influence of a channel (or a group of channels) on the reconstruction error. We consider two use cases: a subject-specific case, where the optimal number and position of the electrodes is determined for each subject individually, and a subject-independent case, where the electrodes are placed at the same positions (in the 10-20 system) for all the subjects. We evaluated our approach using 64-channel EEG data from 90 subjects. In the subject-specific case we found that the correlation between actual and reconstructed envelope first increased with decreasing number of electrodes, with an optimum at around 20 electrodes, yielding 29% higher correlations using the optimal number of electrodes compared to all electrodes. This means that our strategy of removing electrodes can be used to improve the correlation metric in high-density EEG recordings. In the subject-independent case, we obtained a stable decoding performance when decreasing from 64 to 22 channels. When the number of channels was further decreased, the correlation decreased. For a maximal decrease in correlation of 10%, 32 well-placed electrodes were sufficient in 91% of the subjects.
Collapse
|
44
|
Abstract
Speech processing in the human brain is grounded in non-specific auditory processing in the general mammalian brain, but relies on human-specific adaptations for processing speech and language. For this reason, many recent neurophysiological investigations of speech processing have turned to the human brain, with an emphasis on continuous speech. Substantial progress has been made using the phenomenon of "neural speech tracking", in which neurophysiological responses time-lock to the rhythm of auditory (and other) features in continuous speech. One broad category of investigations concerns the extent to which speech tracking measures are related to speech intelligibility, which has clinical applications in addition to its scientific importance. Recent investigations have also focused on disentangling different neural processes that contribute to speech tracking. The two lines of research are closely related, since processing stages throughout auditory cortex contribute to speech comprehension, in addition to subcortical processing and higher order and attentional processes.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Biology, University of Maryland, College Park, Maryland 20742, U.S.A
| |
Collapse
|
45
|
Kulasingham JP, Brodbeck C, Presacco A, Kuchinsky SE, Anderson S, Simon JZ. High gamma cortical processing of continuous speech in younger and older listeners. Neuroimage 2020; 222:117291. [PMID: 32835821 PMCID: PMC7736126 DOI: 10.1016/j.neuroimage.2020.117291] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/12/2020] [Accepted: 08/16/2020] [Indexed: 12/11/2022] Open
Abstract
Neural processing along the ascending auditory pathway is often associated with a progressive reduction in characteristic processing rates. For instance, the well-known frequency-following response (FFR) of the auditory midbrain, as measured with electroencephalography (EEG), is dominated by frequencies from ∼100 Hz to several hundred Hz, phase-locking to the acoustic stimulus at those frequencies. In contrast, cortical responses, whether measured by EEG or magnetoencephalography (MEG), are typically characterized by frequencies of a few Hz to a few tens of Hz, time-locking to acoustic envelope features. In this study we investigated a crossover case, cortically generated responses time-locked to continuous speech features at FFR-like rates. Using MEG, we analyzed responses in the high gamma range of 70-200 Hz to continuous speech using neural source-localized reverse correlation and the corresponding temporal response functions (TRFs). Continuous speech stimuli were presented to 40 subjects (17 younger, 23 older adults) with clinically normal hearing and their MEG responses were analyzed in the 70-200 Hz band. Consistent with the relative insensitivity of MEG to many subcortical structures, the spatiotemporal profile of these response components indicated a cortical origin with ∼40 ms peak latency and a right hemisphere bias. TRF analysis was performed using two separate aspects of the speech stimuli: a) the 70-200 Hz carrier of the speech, and b) the 70-200 Hz temporal modulations in the spectral envelope of the speech stimulus. The response was dominantly driven by the envelope modulation, with a much weaker contribution from the carrier. Age-related differences were also analyzed to investigate a reversal previously seen along the ascending auditory pathway, whereby older listeners show weaker midbrain FFR responses than younger listeners, but, paradoxically, have stronger cortical low frequency responses. In contrast to both these earlier results, this study did not find clear age-related differences in high gamma cortical responses to continuous speech. Cortical responses at FFR-like frequencies shared some properties with midbrain responses at the same frequencies and with cortical responses at much lower frequencies.
Collapse
Affiliation(s)
- Joshua P Kulasingham
- (a)Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States.
| | - Christian Brodbeck
- (b)Institute for Systems Research, University of Maryland, College Park, Maryland, United States.
| | - Alessandro Presacco
- (b)Institute for Systems Research, University of Maryland, College Park, Maryland, United States.
| | - Stefanie E Kuchinsky
- (c)Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States.
| | - Samira Anderson
- (d)Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States.
| | - Jonathan Z Simon
- (a)Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States; (b)Institute for Systems Research, University of Maryland, College Park, Maryland, United States; (e)Department of Biology, University of Maryland, College Park, Maryland, United States.
| |
Collapse
|
46
|
Anderson S, Karawani H. Objective evidence of temporal processing deficits in older adults. Hear Res 2020; 397:108053. [PMID: 32863099 PMCID: PMC7669636 DOI: 10.1016/j.heares.2020.108053] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Revised: 07/15/2020] [Accepted: 07/28/2020] [Indexed: 12/17/2022]
Abstract
The older listener's ability to understand speech in challenging environments may be affected by impaired temporal processing. This review summarizes objective evidence of degraded temporal processing from studies that have used the auditory brainstem response, auditory steady-state response, the envelope- or frequency-following response, cortical auditory-evoked potentials, and neural tracking of continuous speech. Studies have revealed delayed latencies and reduced amplitudes/phase locking in subcortical responses in older vs. younger listeners, in contrast to enhanced amplitudes of cortical responses in older listeners. Reconstruction accuracy of responses to continuous speech (e.g., cortical envelope tracking) shows over-representation in older listeners. Hearing loss is a factor in many of these studies, even though the listeners would be considered to have clinically normal hearing thresholds. Overall, the ability to draw definitive conclusions regarding these studies is limited by the use of multiple stimulus conditions, small sample sizes, and lack of replication. Nevertheless, these objective measures suggest a need to incorporate new clinical measures to provide a more comprehensive assessment of the listener's speech understanding ability, but more work is needed to determine the most efficacious measure for clinical use.
Collapse
Affiliation(s)
- Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD 20742, United States.
| | - Hanin Karawani
- Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel.
| |
Collapse
|
47
|
Linguistic Structure and Meaning Organize Neural Oscillations into a Content-Specific Hierarchy. J Neurosci 2020; 40:9467-9475. [PMID: 33097640 DOI: 10.1523/jneurosci.0302-20.2020] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 09/25/2020] [Accepted: 10/03/2020] [Indexed: 11/21/2022] Open
Abstract
Neural oscillations track linguistic information during speech comprehension (Ding et al., 2016; Keitel et al., 2018), and are known to be modulated by acoustic landmarks and speech intelligibility (Doelling et al., 2014; Zoefel and VanRullen, 2015). However, studies investigating linguistic tracking have either relied on non-naturalistic isochronous stimuli or failed to fully control for prosody. Therefore, it is still unclear whether low-frequency activity tracks linguistic structure during natural speech, where linguistic structure does not follow such a palpable temporal pattern. Here, we measured electroencephalography (EEG) and manipulated the presence of semantic and syntactic information apart from the timescale of their occurrence, while carefully controlling for the acoustic-prosodic and lexical-semantic information in the signal. EEG was recorded while 29 adult native speakers (22 women, 7 men) listened to naturally spoken Dutch sentences, jabberwocky controls with morphemes and sentential prosody, word lists with lexical content but no phrase structure, and backward acoustically matched controls. Mutual information (MI) analysis revealed sensitivity to linguistic content: MI was highest for sentences at the phrasal (0.8-1.1 Hz) and lexical (1.9-2.8 Hz) timescales, suggesting that the delta-band is modulated by lexically driven combinatorial processing beyond prosody, and that linguistic content (i.e., structure and meaning) organizes neural oscillations beyond the timescale and rhythmicity of the stimulus. This pattern is consistent with neurophysiologically inspired models of language comprehension (Martin, 2016, 2020; Martin and Doumas, 2017) where oscillations encode endogenously generated linguistic content over and above exogenous or stimulus-driven timing and rhythm information.SIGNIFICANCE STATEMENT Biological systems like the brain encode their environment not only by reacting in a series of stimulus-driven responses, but by combining stimulus-driven information with endogenous, internally generated, inferential knowledge and meaning. Understanding language from speech is the human benchmark for this. Much research focuses on the purely stimulus-driven response, but here, we focus on the goal of language behavior: conveying structure and meaning. To that end, we use naturalistic stimuli that contrast acoustic-prosodic and lexical-semantic information to show that, during spoken language comprehension, oscillatory modulations reflect computations related to inferring structure and meaning from the acoustic signal. Our experiment provides the first evidence to date that compositional structure and meaning organize the oscillatory response, above and beyond prosodic and lexical controls.
Collapse
|
48
|
Brodbeck C, Jiao A, Hong LE, Simon JZ. Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers. PLoS Biol 2020; 18:e3000883. [PMID: 33091003 PMCID: PMC7644085 DOI: 10.1371/journal.pbio.3000883] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 11/05/2020] [Accepted: 09/14/2020] [Indexed: 01/09/2023] Open
Abstract
Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers’ spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech. How do humans focus on one speaker when several are talking? MEG responses to a continuous two-talker mixture suggest that, even though listeners attend only to one of the talkers, their auditory cortex tracks acoustic features from both speakers. This occurs even when those features are locally masked by the other speaker.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Alex Jiao
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
49
|
Dynamic Time-Locking Mechanism in the Cortical Representation of Spoken Words. eNeuro 2020; 7:ENEURO.0475-19.2020. [PMID: 32513662 PMCID: PMC7470935 DOI: 10.1523/eneuro.0475-19.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/21/2022] Open
Abstract
Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we used the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.
Collapse
|
50
|
Miran S, Presacco A, Simon JZ, Fu MC, Marcus SI, Babadi B. Dynamic estimation of auditory temporal response functions via state-space models with Gaussian mixture process noise. PLoS Comput Biol 2020; 16:e1008172. [PMID: 32813712 PMCID: PMC7485982 DOI: 10.1371/journal.pcbi.1008172] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 09/11/2020] [Accepted: 07/21/2020] [Indexed: 11/19/2022] Open
Abstract
Estimating the latent dynamics underlying biological processes is a central problem in computational biology. State-space models with Gaussian statistics are widely used for estimation of such latent dynamics and have been successfully utilized in the analysis of biological data. Gaussian statistics, however, fail to capture several key features of the dynamics of biological processes (e.g., brain dynamics) such as abrupt state changes and exogenous processes that affect the states in a structured fashion. Although Gaussian mixture process noise models have been considered as an alternative to capture such effects, data-driven inference of their parameters is not well-established in the literature. The objective of this paper is to develop efficient algorithms for inferring the parameters of a general class of Gaussian mixture process noise models from noisy and limited observations, and to utilize them in extracting the neural dynamics that underlie auditory processing from magnetoencephalography (MEG) data in a cocktail party setting. We develop an algorithm based on Expectation-Maximization to estimate the process noise parameters from state-space observations. We apply our algorithm to simulated and experimentally-recorded MEG data from auditory experiments in the cocktail party paradigm to estimate the underlying dynamic Temporal Response Functions (TRFs). Our simulation results show that the richer representation of the process noise as a Gaussian mixture significantly improves state estimation and capturing the heterogeneity of the TRF dynamics. Application to MEG data reveals improvements over existing TRF estimation techniques, and provides a reliable alternative to current approaches for probing neural dynamics in a cocktail party scenario, as well as attention decoding in emerging applications such as smart hearing aids. Our proposed methodology provides a framework for efficient inference of Gaussian mixture process noise models, with application to a wide range of biological data with underlying heterogeneous and latent dynamics.
Collapse
Affiliation(s)
- Sina Miran
- Starkey Hearing Technologies, Eden Prairie, Minnesota, United States of America
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical & Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| | - Michael C. Fu
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Robert H. Smith School of Business, University of Maryland, College Park, Maryland, United States of America
| | - Steven I. Marcus
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical & Computer Engineering, University of Maryland, College Park, Maryland, United States of America
| | - Behtash Babadi
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical & Computer Engineering, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|