1
|
Wang L, Chen F. EEG responses to onset-edge and steady-state segments of continuous speech under selective auditory attention modulation. Hear Res 2025; 463:109298. [PMID: 40344751 DOI: 10.1016/j.heares.2025.109298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 05/01/2025] [Accepted: 05/02/2025] [Indexed: 05/11/2025]
Abstract
Electroencephalography (EEG) signals provide valuable insights into the neural mechanisms of speech perception. However, it still remains unclear how neural responses dynamically align with continuous speech under selective attention modulation in the complex auditory environments. This study examined the evoked and induced EEG responses, their correlations with speech features, and cortical distributions for the target and ignored speech in two-talker competing scenarios. Results showed that selective attention increased the evoked EEG power for the target speech compared to the ignored speech. In contrast, the induced power indicated no significant differences. Low-frequency EEG power and phase responses reliably indexed the target speech identification amid competing streams. Cortical distribution analyses revealed that the evoked power differences between the target and ignored speech were concentrated in the central and parietal cortices. Significant induced power differences between the target and ignored speech presented only at the onset-edge segments in the left temporal cortex. Comparisons between onset-edge and steady-state segments showed the evoked power differences in the right central and temporal cortices and the induced power differences in the frontal cortex for the ignored speech. No significant differences of the cortical distribution were observed between the onset-edge and steady-state segments for the target speech. These findings underscore the distinct contributions of the evoked and induced neural activities and their cortical distributions to selective auditory attention and segment-specific speech perception.
Collapse
Affiliation(s)
- Lei Wang
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, China.
| | - Fei Chen
- Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
2
|
Herrmann B. Enhanced neural speech tracking through noise indicates stochastic resonance in humans. eLife 2025; 13:RP100830. [PMID: 40100253 PMCID: PMC11919254 DOI: 10.7554/elife.100830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025] Open
Abstract
Neural activity in auditory cortex tracks the amplitude-onset envelope of continuous speech, but recent work counterintuitively suggests that neural tracking increases when speech is masked by background noise, despite reduced speech intelligibility. Noise-related amplification could indicate that stochastic resonance - the response facilitation through noise - supports neural speech tracking, but a comprehensive account is lacking. In five human electroencephalography experiments, the current study demonstrates a generalized enhancement of neural speech tracking due to minimal background noise. Results show that (1) neural speech tracking is enhanced for speech masked by background noise at very high signal-to-noise ratios (~30 dB SNR) where speech is highly intelligible; (2) this enhancement is independent of attention; (3) it generalizes across different stationary background maskers, but is strongest for 12-talker babble; and (4) it is present for headphone and free-field listening, suggesting that the neural-tracking enhancement generalizes to real-life listening. The work paints a clear picture that minimal background noise enhances the neural representation of the speech onset-envelope, suggesting that stochastic resonance contributes to neural speech tracking. The work further highlights non-linearities of neural tracking induced by background noise that make its use as a biological marker for speech processing challenging.
Collapse
Affiliation(s)
- Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and EducationTorontoCanada
- Department of Psychology, University of TorontoTorontoCanada
| |
Collapse
|
3
|
Liang M, Gerwien J, Gutschalk A. A listening advantage for native speech is reflected by attention-related activity in auditory cortex. Commun Biol 2025; 8:180. [PMID: 39910341 PMCID: PMC11799217 DOI: 10.1038/s42003-025-07601-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 01/24/2025] [Indexed: 02/07/2025] Open
Abstract
The listening advantage for native speech is well known, but the neural basis of the effect remains unknown. Here we test the hypothesis that attentional enhancement in auditory cortex is stronger for native speech, using magnetoencephalography. Chinese and German speech stimuli were recorded by a bilingual speaker and combined into a two-stream, cocktail-party scene, with consistent and inconsistent language combinations. A group of native speakers of Chinese and a group of native speakers of German performed a detection task in the cued target stream. Results show that attention enhances negative-going activity in the temporal response function deconvoluted from the speech envelope. This activity is stronger when the target stream is in the native compared to the non-native language, and for inconsistent compared to consistent language stimuli. We interpret the findings to show that the stronger activity for native speech could be related to better top-down prediction of the native speech streams.
Collapse
Affiliation(s)
- Meng Liang
- Department of Neurology, University of Heidelberg, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany
| | - Johannes Gerwien
- Institute of German as a Foreign Language Philology, University of Heidelberg, Plöck 55, 69117, Heidelberg, Germany
| | - Alexander Gutschalk
- Department of Neurology, University of Heidelberg, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany.
| |
Collapse
|
4
|
Reisinger P, Gillis M, Suess N, Vanthornhout J, Haider CL, Hartmann T, Hauswald A, Schwarz K, Francart T, Weisz N. Neural Speech Tracking Contribution of Lip Movements Predicts Behavioral Deterioration When the Speaker's Mouth Is Occluded. eNeuro 2025; 12:ENEURO.0368-24.2024. [PMID: 39819839 PMCID: PMC11801124 DOI: 10.1523/eneuro.0368-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Accepted: 12/09/2024] [Indexed: 01/19/2025] Open
Abstract
Observing lip movements of a speaker facilitates speech understanding, especially in challenging listening situations. Converging evidence from neuroscientific studies shows stronger neural responses to audiovisual stimuli compared with audio-only stimuli. However, the interindividual variability of this contribution of lip movement information and its consequences on behavior are unknown. We analyzed source-localized magnetoencephalographic responses from 29 normal-hearing participants (12 females) listening to audiovisual speech, both with and without the speaker wearing a surgical face mask, and in the presence or absence of a distractor speaker. Using temporal response functions to quantify neural speech tracking, we show that neural responses to lip movements are, in general, enhanced when speech is challenging. After controlling for speech acoustics, we show that lip movements contribute to enhanced neural speech tracking, particularly when a distractor speaker is present. However, the extent of this visual contribution to neural speech tracking varied greatly among participants. Probing the behavioral relevance, we demonstrate that individuals who show a higher contribution of lip movements in terms of neural speech tracking show a stronger drop in comprehension and an increase in perceived difficulty when the mouth is occluded by a surgical face mask. In contrast, no effect was found when the mouth was not occluded. We provide novel insights on how the contribution of lip movements in terms of neural speech tracking varies among individuals and its behavioral relevance, revealing negative consequences when visual speech is absent. Our results also offer potential implications for objective assessments of audiovisual speech perception.
Collapse
Affiliation(s)
- Patrick Reisinger
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Nina Suess
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Chandra Leon Haider
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Thomas Hartmann
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Anne Hauswald
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | | | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Nathan Weisz
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University Salzburg, Salzburg 5020, Austria
| |
Collapse
|
5
|
O’Reilly D, Shaw W, Hilt P, de Castro Aguiar R, Astill SL, Delis I. Quantifying the diverse contributions of hierarchical muscle interactions to motor function. iScience 2025; 28:111613. [PMID: 39834869 PMCID: PMC11742840 DOI: 10.1016/j.isci.2024.111613] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 04/19/2024] [Accepted: 12/12/2024] [Indexed: 01/22/2025] Open
Abstract
The muscle synergy concept suggests that the human motor system is organized into functional modules composed of muscles "working together" toward common task goals. This study offers a nuanced computational perspective to muscle synergies, where muscles interacting across multiple scales have functionally similar, complementary, and independent roles. Making this viewpoint implicit to a methodological approach applying Partial Information Decomposition to large-scale muscle activations, we unveiled nested networks of functionally diverse inter- and intramuscular interactions with distinct functional consequences on task performance. The effectiveness of this approach is demonstrated using simulations and by extracting generalizable muscle networks from benchmark datasets of muscle activity. Specific network components are shown to correlate with (1) balance performance and (2) differences in motor variability between young and older adults. By aligning muscle synergy analysis with leading theoretical insights on movement modularity, the mechanistic insights presented here suggest the proposed methodology offers enhanced research opportunities toward health and engineering applications.
Collapse
Affiliation(s)
- David O’Reilly
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - William Shaw
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - Pauline Hilt
- INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences Du Sport, F-21000 Dijon, France
| | | | - Sarah L. Astill
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| |
Collapse
|
6
|
Haupt T, Rosenkranz M, Bleichner MG. Exploring Relevant Features for EEG-Based Investigation of Sound Perception in Naturalistic Soundscapes. eNeuro 2025; 12:ENEURO.0287-24.2024. [PMID: 39753371 PMCID: PMC11747973 DOI: 10.1523/eneuro.0287-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/15/2024] [Accepted: 10/18/2024] [Indexed: 01/19/2025] Open
Abstract
A comprehensive analysis of everyday sound perception can be achieved using electroencephalography (EEG) with the concurrent acquisition of information about the environment. While extensive research has been dedicated to speech perception, the complexities of auditory perception within everyday environments, specifically the types of information and the key features to extract, remain less explored. Our study aims to systematically investigate the relevance of different feature categories: discrete sound-identity markers, general cognitive state information, and acoustic representations, including discrete sound onset, the envelope, and mel-spectrogram. Using continuous data analysis, we contrast different features in terms of their predictive power for unseen data and thus their distinct contributions to explaining neural data. For this, we analyze data from a complex audio-visual motor task using a naturalistic soundscape. The results demonstrated that the feature sets that explain the most neural variability were a combination of highly detailed acoustic features with a comprehensive description of specific sound onsets. Furthermore, it showed that established features can be applied to complex soundscapes. Crucially, the outcome hinged on excluding periods devoid of sound onsets in the analysis in the case of the discrete features. Our study highlights the importance to comprehensively describe the soundscape, using acoustic and non-acoustic aspects, to fully understand the dynamics of sound perception in complex situations. This approach can serve as a foundation for future studies aiming to investigate sound perception in natural settings.
Collapse
Affiliation(s)
- Thorge Haupt
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| | - Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| | - Martin G Bleichner
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
- Research Center for Neurosensory Science, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
7
|
Lenc T, Lenoir C, Keller PE, Polak R, Mulders D, Nozaradan S. Measuring self-similarity in empirical signals to understand musical beat perception. Eur J Neurosci 2025; 61:e16637. [PMID: 39853878 PMCID: PMC11760665 DOI: 10.1111/ejn.16637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/15/2024] [Accepted: 11/26/2024] [Indexed: 01/26/2025]
Abstract
Experiencing music often entails the perception of a periodic beat. Despite being a widespread phenomenon across cultures, the nature and neural underpinnings of beat perception remain largely unknown. In the last decade, there has been a growing interest in developing methods to probe these processes, particularly to measure the extent to which beat-related information is contained in behavioral and neural responses. Here, we propose a theoretical framework and practical implementation of an analytic approach to capture beat-related periodicity in empirical signals using frequency-tagging. We highlight its sensitivity in measuring the extent to which the periodicity of a perceived beat is represented in a range of continuous time-varying signals with minimal assumptions. We also discuss a limitation of this approach with respect to its specificity when restricted to measuring beat-related periodicity only from the magnitude spectrum of a signal and introduce a novel extension of the approach based on autocorrelation to overcome this issue. We test the new autocorrelation-based method using simulated signals and by re-analyzing previously published data and show how it can be used to process measurements of brain activity as captured with surface EEG in adults and infants in response to rhythmic inputs. Taken together, the theoretical framework and related methodological advances confirm and elaborate the frequency-tagging approach as a promising window into the processes underlying beat perception and, more generally, temporally coordinated behaviors.
Collapse
Affiliation(s)
- Tomas Lenc
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Basque Center on Cognition, Brain and Language (BCBL)Donostia‐San SebastianSpain
| | - Cédric Lenoir
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
| | - Peter E. Keller
- MARCS Institute for Brain, Behaviour and DevelopmentWestern Sydney UniversitySydneyAustralia
- Center for Music in the Brain & Department of Clinical MedicineAarhus UniversityAarhusDenmark
| | - Rainer Polak
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and MotionUniversity of OsloOsloNorway
- Department of MusicologyUniversity of OsloOsloNorway
| | - Dounia Mulders
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Computational and Biological Learning Unit, Department of EngineeringUniversity of CambridgeCambridgeUK
- Institute for Information and Communication TechnologiesElectronics and Applied Mathematics, UCLouvainLouvain‐la‐NeuveBelgium
- Department of Brain and Cognitive Sciences and McGovern InstituteMassachusetts Institute of Technology (MIT)CambridgeMassachusettsUSA
| | - Sylvie Nozaradan
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- International Laboratory for Brain, Music and Sound Research (BRAMS)MontrealCanada
| |
Collapse
|
8
|
Coopmans CW, de Hoop H, Tezcan F, Hagoort P, Martin AE. Language-specific neural dynamics extend syntax into the time domain. PLoS Biol 2025; 23:e3002968. [PMID: 39836653 PMCID: PMC11750093 DOI: 10.1371/journal.pbio.3002968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 12/05/2024] [Indexed: 01/23/2025] Open
Abstract
Studies of perception have long shown that the brain adds information to its sensory analysis of the physical environment. A touchstone example for humans is language use: to comprehend a physical signal like speech, the brain must add linguistic knowledge, including syntax. Yet, syntactic rules and representations are widely assumed to be atemporal (i.e., abstract and not bound by time), so they must be translated into time-varying signals for speech comprehension and production. Here, we test 3 different models of the temporal spell-out of syntactic structure against brain activity of people listening to Dutch stories: an integratory bottom-up parser, a predictive top-down parser, and a mildly predictive left-corner parser. These models build exactly the same structure but differ in when syntactic information is added by the brain-this difference is captured in the (temporal distribution of the) complexity metric "incremental node count." Using temporal response function models with both acoustic and information-theoretic control predictors, node counts were regressed against source-reconstructed delta-band activity acquired with magnetoencephalography. Neural dynamics in left frontal and temporal regions most strongly reflect node counts derived by the top-down method, which postulates syntax early in time, suggesting that predictive structure building is an important component of Dutch sentence comprehension. The absence of strong effects of the left-corner model further suggests that its mildly predictive strategy does not represent Dutch language comprehension well, in contrast to what has been found for English. Understanding when the brain projects its knowledge of syntax onto speech, and whether this is done in language-specific ways, will inform and constrain the development of mechanistic models of syntactic structure building in the brain.
Collapse
Affiliation(s)
- Cas W. Coopmans
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| | - Helen de Hoop
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| | - Filiz Tezcan
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
9
|
Giroud J, Trébuchon A, Mercier M, Davis MH, Morillon B. The human auditory cortex concurrently tracks syllabic and phonemic timescales via acoustic spectral flux. SCIENCE ADVANCES 2024; 10:eado8915. [PMID: 39705351 DOI: 10.1126/sciadv.ado8915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 11/15/2024] [Indexed: 12/22/2024]
Abstract
Dynamical theories of speech processing propose that the auditory cortex parses acoustic information in parallel at the syllabic and phonemic timescales. We developed a paradigm to independently manipulate both linguistic timescales, and acquired intracranial recordings from 11 patients who are epileptic listening to French sentences. Our results indicate that (i) syllabic and phonemic timescales are both reflected in the acoustic spectral flux; (ii) during comprehension, the auditory cortex tracks the syllabic timescale in the theta range, while neural activity in the alpha-beta range phase locks to the phonemic timescale; (iii) these neural dynamics occur simultaneously and share a joint spatial location; (iv) the spectral flux embeds two timescales-in the theta and low-beta ranges-across 17 natural languages. These findings help us understand how the human brain extracts acoustic information from the continuous speech signal at multiple timescales simultaneously, a prerequisite for subsequent linguistic processing.
Collapse
Affiliation(s)
- Jérémy Giroud
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Agnès Trébuchon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
- APHM, Clinical Neurophysiology, Timone Hospital, Marseille, France
| | - Manuel Mercier
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| |
Collapse
|
10
|
Carta S, Aličković E, Zaar J, Valdés AL, Di Liberto GM. Cortical encoding of phonetic onsets of both attended and ignored speech in hearing impaired individuals. PLoS One 2024; 19:e0308554. [PMID: 39576775 PMCID: PMC11584098 DOI: 10.1371/journal.pone.0308554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 07/17/2024] [Indexed: 11/24/2024] Open
Abstract
Hearing impairment alters the sound input received by the human auditory system, reducing speech comprehension in noisy multi-talker auditory scenes. Despite such difficulties, neural signals were shown to encode the attended speech envelope more reliably than the envelope of ignored sounds, reflecting the intention of listeners with hearing impairment (HI). This result raises an important question: What speech-processing stage could reflect the difficulty in attentional selection, if not envelope tracking? Here, we use scalp electroencephalography (EEG) to test the hypothesis that the neural encoding of phonological information (i.e., phonetic boundaries and phonological categories) is affected by HI. In a cocktail-party scenario, such phonological difficulty might be reflected in an overrepresentation of phonological information for both attended and ignored speech sounds, with detrimental effects on the ability to effectively focus on the speaker of interest. To investigate this question, we carried out a re-analysis of an existing dataset where EEG signals were recorded as participants with HI, fitted with hearing aids, attended to one speaker (target) while ignoring a competing speaker (masker) and spatialised multi-talker background noise. Multivariate temporal response function (TRF) analyses indicated a stronger phonological information encoding for target than masker speech streams. Follow-up analyses aimed at disentangling the encoding of phonological categories and phonetic boundaries (phoneme onsets) revealed that neural signals encoded the phoneme onsets for both target and masker streams, in contrast with previously published findings with normal hearing (NH) participants and in line with our hypothesis that speech comprehension difficulties emerge due to a robust phonological encoding of both target and masker. Finally, the neural encoding of phoneme-onsets was stronger for the masker speech, pointing to a possible neural basis for the higher distractibility experienced by individuals with HI.
Collapse
Affiliation(s)
- Sara Carta
- School of Computer Science and Statistics, ADAPT Centre, Trinity College Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Emina Aličković
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Johannes Zaar
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Hearing Systems Section, Technical University of Denmark, Kgs. Lyngby, Lyngby, Denmark
| | - Alejandro López Valdés
- Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
- Global Brain Health Institute, School of Engineering, Trinity Centre for Biomedical Engineering, College Dublin, Dublin, Ireland
| | - Giovanni M. Di Liberto
- School of Computer Science and Statistics, ADAPT Centre, Trinity College Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
11
|
Anderson AJ, Davis C, Lalor EC. Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation. PLoS Comput Biol 2024; 20:e1012537. [PMID: 39527649 PMCID: PMC11581396 DOI: 10.1371/journal.pcbi.1012537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 11/21/2024] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
To transform continuous speech into words, the human brain must resolve variability across utterances in intonation, speech rate, volume, accents and so on. A promising approach to explaining this process has been to model electroencephalogram (EEG) recordings of brain responses to speech. Contemporary models typically invoke context invariant speech categories (e.g. phonemes) as an intermediary representational stage between sounds and words. However, such models may not capture the complete picture because they do not model the brain mechanism that categorizes sounds and consequently may overlook associated neural representations. By providing end-to-end accounts of speech-to-text transformation, new deep-learning systems could enable more complete brain models. We model EEG recordings of audiobook comprehension with the deep-learning speech recognition system Whisper. We find that (1) Whisper provides a self-contained EEG model of an intermediary representational stage that reflects elements of prelexical and lexical representation and prediction; (2) EEG modeling is more accurate when informed by 5-10s of speech context, which traditional context invariant categorical models do not encode; (3) Deep Whisper layers encoding linguistic structure were more accurate EEG models of selectively attended speech in two-speaker "cocktail party" listening conditions than early layers encoding acoustics. No such layer depth advantage was observed for unattended speech, consistent with a more superficial level of linguistic processing in the brain.
Collapse
Affiliation(s)
- Andrew J. Anderson
- Department of Neurology. Medical College of Wisconsin, Milwaukee, Wisconsin United States of America
- Department of Biomedical Engineering. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neurosurgery. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
| | - Chris Davis
- Western Sydney University, The MARCS Institute for Brain, Behaviour and Development, Westmead Innovation Quarter, Westmead, New South Wales, Australia
| | - Edmund C. Lalor
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Center for Visual Science, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
12
|
Aldag N, Nogueira W. Phoneme-related potentials recorded from normal hearing listeners and cochlear implant users in a selective attention paradigm to continuous speech. Hear Res 2024; 454:109136. [PMID: 39532054 DOI: 10.1016/j.heares.2024.109136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 09/30/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Cochlear implants can restore the ability to understand speech in patients with profound sensorineural hearing loss. At present, it is not fully understood how cochlear implant users perceive speech and how electric hearing provided by a cochlear implant differs from acoustic hearing. Phoneme-related potentials characterize neural responses to individual instances of phonemes extracted from continuous speech. This retrospective study investigated phoneme-related potentials in cochlear implant users in a selective attention paradigm. Responses were compared between normal hearing listeners and cochlear implant users, and between attended and unattended conditions. Differences between phoneme categories were compared and a classifier was trained to predict the phoneme category from the neural representation. The phoneme-related potentials of cochlear implant users showed similar responses to the ones obtained in normal hearing listeners for early responses (< 100 ms) but not for later responses (> 100 ms) where peaks were smaller or absent. Attention led to an enhancement of the response, whereas latency was mostly not affected by attention. The temporal morphology of the response was influenced by the phonetic features of the stimulus, allowing a classification of the phoneme category based on the phoneme-related potentials. There is a clinical need for methods that can rapidly and objectively assess the speech understanding performance of cochlear implant users. Phoneme-related potentials may provide such a link between the acoustic and the neural representations of phonemes. They may also reveal the challenges of individual subjects and thus provide indications for patient-specific auditory training, rehabilitation programs or the fitting of cochlear implant parameters.
Collapse
Affiliation(s)
- Nina Aldag
- Department of Otolaryngology, Hannover Medical School and Cluster of Excellence "Hearing4all", Hanover, Germany.
| | - Waldo Nogueira
- Department of Otolaryngology, Hannover Medical School and Cluster of Excellence "Hearing4all", Hanover, Germany
| |
Collapse
|
13
|
Guo ZC, McHaney JR, Parthasarathy A, Chandrasekaran B. Reduced neural distinctiveness of speech representations in the middle-aged brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.28.609778. [PMID: 39253477 PMCID: PMC11383304 DOI: 10.1101/2024.08.28.609778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Speech perception declines independent of hearing thresholds in middle-age, and the neurobiological reasons are unclear. In line with the age-related neural dedifferentiation hypothesis, we predicted that middle-aged adults show less distinct cortical representations of phonemes and acoustic-phonetic features relative to younger adults. In addition to an extensive audiological, auditory electrophysiological, and speech perceptual test battery, we measured electroencephalographic responses time-locked to phoneme instances (phoneme-related potential; PRP) in naturalistic, continuous speech and trained neural network classifiers to predict phonemes from these responses. Consistent with age-related neural dedifferentiation, phoneme predictions were less accurate, more uncertain, and involved a broader network for middle-aged adults compared with younger adults. Representational similarity analysis revealed that the featural relationship between phonemes was less robust in middle-age. Electrophysiological and behavioral measures revealed signatures of cochlear neural degeneration (CND) and speech perceptual deficits in middle-aged adults relative to younger adults. Consistent with prior work in animal models, signatures of CND were associated with greater cortical dedifferentiation, explaining nearly a third of the variance in PRP prediction accuracy together with measures of acoustic neural processing. Notably, even after controlling for CND signatures and acoustic processing abilities, age-group differences in PRP prediction accuracy remained. Overall, our results reveal "fuzzier" phonemic representations, suggesting that age-related cortical neural dedifferentiation can occur even in middle-age and may underlie speech perceptual challenges, despite a normal audiogram.
Collapse
Affiliation(s)
- Zhe-chen Guo
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| | - Jacie R. McHaney
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| | | | - Bharath Chandrasekaran
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| |
Collapse
|
14
|
Ünsal E, Duygun R, Yemeniciler İ, Bingöl E, Ceran Ö, Güntekin B. From Infancy to Childhood: A Comprehensive Review of Event- and Task-Related Brain Oscillations. Brain Sci 2024; 14:837. [PMID: 39199528 PMCID: PMC11352659 DOI: 10.3390/brainsci14080837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 09/01/2024] Open
Abstract
Brain development from infancy through childhood involves complex structural and functional changes influenced by both internal and external factors. This review provides a comprehensive analysis of event and task-related brain oscillations, focusing on developmental changes across different frequency bands, including delta, theta, alpha, beta, and gamma. Electroencephalography (EEG) studies highlight that these oscillations serve as functional building blocks for sensory and cognitive processes, with significant variations observed across different developmental stages. Delta oscillations, primarily associated with deep sleep and early cognitive demands, gradually diminish as children age. Theta rhythms, crucial for attention and memory, display a distinct pattern in early childhood, evolving with cognitive maturation. Alpha oscillations, reflecting thalamocortical interactions and cognitive performance, increase in complexity with age. Beta rhythms, linked to active thinking and problem-solving, show developmental differences in motor and cognitive tasks. Gamma oscillations, associated with higher cognitive functions, exhibit notable changes in response to sensory stimuli and cognitive tasks. This review underscores the importance of understanding oscillatory dynamics to elucidate brain development and its implications for sensory and cognitive processing in childhood. The findings provide a foundation for future research on developmental neuroscience and potential clinical applications.
Collapse
Affiliation(s)
- Esra Ünsal
- Department of Neuroscience, Graduate School of Health Sciences, Istanbul Medipol University, 34810 Istanbul, Turkey; (E.Ü.); (R.D.); (İ.Y.); (E.B.)
- Neuroscience Research Center, Research Institute for Health Sciences and Technologies (SABITA), Istanbul Medipol University, 34810 Istanbul, Turkey
| | - Rümeysa Duygun
- Department of Neuroscience, Graduate School of Health Sciences, Istanbul Medipol University, 34810 Istanbul, Turkey; (E.Ü.); (R.D.); (İ.Y.); (E.B.)
- Neuroscience Research Center, Research Institute for Health Sciences and Technologies (SABITA), Istanbul Medipol University, 34810 Istanbul, Turkey
| | - İrem Yemeniciler
- Department of Neuroscience, Graduate School of Health Sciences, Istanbul Medipol University, 34810 Istanbul, Turkey; (E.Ü.); (R.D.); (İ.Y.); (E.B.)
- Neuroscience Research Center, Research Institute for Health Sciences and Technologies (SABITA), Istanbul Medipol University, 34810 Istanbul, Turkey
- Department of Biophysics, School of Medicine, Istanbul Medipol University, 34810 Istanbul, Turkey
| | - Elifnur Bingöl
- Department of Neuroscience, Graduate School of Health Sciences, Istanbul Medipol University, 34810 Istanbul, Turkey; (E.Ü.); (R.D.); (İ.Y.); (E.B.)
- Neuroscience Research Center, Research Institute for Health Sciences and Technologies (SABITA), Istanbul Medipol University, 34810 Istanbul, Turkey
- Department of Biophysics, School of Medicine, Istanbul Medipol University, 34810 Istanbul, Turkey
| | - Ömer Ceran
- Department of Pediatrics, School of Medicine, Istanbul Medipol University, 34810 Istanbul, Turkey;
| | - Bahar Güntekin
- Neuroscience Research Center, Research Institute for Health Sciences and Technologies (SABITA), Istanbul Medipol University, 34810 Istanbul, Turkey
- Department of Biophysics, School of Medicine, Istanbul Medipol University, 34810 Istanbul, Turkey
| |
Collapse
|
15
|
Puffay C, Vanthornhout J, Gillis M, Clercq PD, Accou B, Hamme HV, Francart T. Classifying coherent versus nonsense speech perception from EEG using linguistic speech features. Sci Rep 2024; 14:18922. [PMID: 39143297 PMCID: PMC11324895 DOI: 10.1038/s41598-024-69568-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 08/06/2024] [Indexed: 08/16/2024] Open
Abstract
When a person listens to natural speech, the relation between features of the speech signal and the corresponding evoked electroencephalogram (EEG) is indicative of neural processing of the speech signal. Using linguistic representations of speech, we investigate the differences in neural processing between speech in a native and foreign language that is not understood. We conducted experiments using three stimuli: a comprehensible language, an incomprehensible language, and randomly shuffled words from a comprehensible language, while recording the EEG signal of native Dutch-speaking participants. We modeled the neural tracking of linguistic features of the speech signals using a deep-learning model in a match-mismatch task that relates EEG signals to speech, while accounting for lexical segmentation features reflecting acoustic processing. The deep learning model effectively classifies coherent versus nonsense languages. We also observed significant differences in tracking patterns between comprehensible and incomprehensible speech stimuli within the same language. It demonstrates the potential of deep learning frameworks in measuring speech understanding objectively.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium.
| | | | - Marlies Gillis
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
| | | | - Bernd Accou
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
| |
Collapse
|
16
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
17
|
Kay JW. A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry. ENTROPY (BASEL, SWITZERLAND) 2024; 26:542. [PMID: 39056905 PMCID: PMC11276306 DOI: 10.3390/e26070542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/28/2024]
Abstract
There is much interest in the topic of partial information decomposition, both in developing new algorithms and in developing applications. An algorithm, based on standard results from information geometry, was recently proposed by Niu and Quinn (2019). They considered the case of three scalar random variables from an exponential family, including both discrete distributions and a trivariate Gaussian distribution. The purpose of this article is to extend their work to the general case of multivariate Gaussian systems having vector inputs and a vector output. By making use of standard results from information geometry, explicit expressions are derived for the components of the partial information decomposition for this system. These expressions depend on a real-valued parameter which is determined by performing a simple constrained convex optimisation. Furthermore, it is proved that the theoretical properties of non-negativity, self-redundancy, symmetry and monotonicity, which were proposed by Williams and Beer (2010), are valid for the decomposition Iig derived herein. Application of these results to real and simulated data show that the Iig algorithm does produce the results expected when clear expectations are available, although in some scenarios, it can overestimate the level of the synergy and shared information components of the decomposition, and correspondingly underestimate the levels of unique information. Comparisons of the Iig and Idep (Kay and Ince, 2018) methods show that they can both produce very similar results, but interesting differences are provided. The same may be said about comparisons between the Iig and Immi (Barrett, 2015) methods.
Collapse
Affiliation(s)
- Jim W Kay
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QQ, UK
| |
Collapse
|
18
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
19
|
Corsini A, Tomassini A, Pastore A, Delis I, Fadiga L, D'Ausilio A. Speech perception difficulty modulates theta-band encoding of articulatory synergies. J Neurophysiol 2024; 131:480-491. [PMID: 38323331 DOI: 10.1152/jn.00388.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 02/08/2024] Open
Abstract
The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception.NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding.
Collapse
Affiliation(s)
- Alessandro Corsini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alice Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Aldo Pastore
- Laboratorio NEST, Scuola Normale Superiore, Pisa, Italy
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, Leeds, United Kingdom
| | - Luciano Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alessandro D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| |
Collapse
|
20
|
Mai A, Riès S, Ben-Haim S, Shih JJ, Gentner TQ. Acoustic and language-specific sources for phonemic abstraction from speech. Nat Commun 2024; 15:677. [PMID: 38263364 PMCID: PMC10805762 DOI: 10.1038/s41467-024-44844-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 01/03/2024] [Indexed: 01/25/2024] Open
Abstract
Spoken language comprehension requires abstraction of linguistic information from speech, but the interaction between auditory and linguistic processing of speech remains poorly understood. Here, we investigate the nature of this abstraction using neural responses recorded intracranially while participants listened to conversational English speech. Capitalizing on multiple, language-specific patterns where phonological and acoustic information diverge, we demonstrate the causal efficacy of the phoneme as a unit of analysis and dissociate the unique contributions of phonemic and spectrographic information to neural responses. Quantitive higher-order response models also reveal that unique contributions of phonological information are carried in the covariance structure of the stimulus-response relationship. This suggests that linguistic abstraction is shaped by neurobiological mechanisms that involve integration across multiple spectro-temporal features and prior phonological information. These results link speech acoustics to phonology and morphosyntax, substantiating predictions about abstractness in linguistic theory and providing evidence for the acoustic features that support that abstraction.
Collapse
Affiliation(s)
- Anna Mai
- University of California, San Diego, Linguistics, 9500 Gilman Dr., La Jolla, CA, 92093, USA.
| | - Stephanie Riès
- San Diego State University, School of Speech, Language, and Hearing Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
- San Diego State University, Center for Clinical and Cognitive Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Sharona Ben-Haim
- University of California, San Diego, Neurological Surgery, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Jerry J Shih
- University of California, San Diego, Neurosciences, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Timothy Q Gentner
- University of California, San Diego, Psychology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Neurobiology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Kavli Institute for Brain and Mind, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| |
Collapse
|
21
|
Jalilpour Monesi M, Vanthornhout J, Francart T, Van Hamme H. The role of vowel and consonant onsets in neural tracking of natural speech. J Neural Eng 2024; 21:016002. [PMID: 38205849 DOI: 10.1088/1741-2552/ad1784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/20/2023] [Indexed: 01/12/2024]
Abstract
Objective. To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal.Approach. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory based model to determine which of two candidate speech segments matches with a given EEG segment.Main results. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets exhibit a more consistent representation in EEG compared to syllable onsets.Significance. Finally, our findings suggest that neural tracking previously thought to be associated with BPCs might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.
Collapse
Affiliation(s)
- Mohammad Jalilpour Monesi
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | | | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| |
Collapse
|
22
|
Brodbeck C, Kandylaki KD, Scharenborg O. Neural Representations of Non-native Speech Reflect Proficiency and Interference from Native Language Knowledge. J Neurosci 2024; 44:e0666232023. [PMID: 37963763 PMCID: PMC10851685 DOI: 10.1523/jneurosci.0666-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/23/2023] [Accepted: 08/01/2023] [Indexed: 11/16/2023] Open
Abstract
Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic-phonetic models in more proficient listeners.Significance Statement Behavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut 06269
| | - Katerina Danae Kandylaki
- Department of Neuropsychology and Psychopharmacology, Maastricht University, 6200 MD, Maastricht, The Netherlands
| | - Odette Scharenborg
- Multimedia Computing Group, Delft University of Technology, 2628 XE, Delft, The Netherlands
| |
Collapse
|
23
|
Keshavarzi M, Di Liberto GM, Gabrielczyk F, Wilson A, Macfarlane A, Goswami U. Atypical speech production of multisyllabic words and phrases by children with developmental dyslexia. Dev Sci 2024; 27:e13428. [PMID: 37381667 DOI: 10.1111/desc.13428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 05/20/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023]
Abstract
The prevalent "core phonological deficit" model of dyslexia proposes that the reading and spelling difficulties characterizing affected children stem from prior developmental difficulties in processing speech sound structure, for example, perceiving and identifying syllable stress patterns, syllables, rhymes and phonemes. Yet spoken word production appears normal. This suggests an unexpected disconnect between speech input and speech output processes. Here we investigated the output side of this disconnect from a speech rhythm perspective by measuring the speech amplitude envelope (AE) of multisyllabic spoken phrases. The speech AE contains crucial information regarding stress patterns, speech rate, tonal contrasts and intonational information. We created a novel computerized speech copying task in which participants copied aloud familiar spoken targets like "Aladdin." Seventy-five children with and without dyslexia were tested, some of whom were also receiving an oral intervention designed to enhance multi-syllabic processing. Similarity of the child's productions to the target AE was computed using correlation and mutual information metrics. Similarity of pitch contour, another acoustic cue to speech rhythm, was used for control analyses. Children with dyslexia were significantly worse at producing the multi-syllabic targets as indexed by both similarity metrics for computing the AE. However, children with dyslexia were not different from control children in producing pitch contours. Accordingly, the spoken production of multisyllabic phrases by children with dyslexia is atypical regarding the AE. Children with dyslexia may not appear to listeners to exhibit speech production difficulties because their pitch contours are intact. RESEARCH HIGHLIGHTS: Speech production of syllable stress patterns is atypical in children with dyslexia. Children with dyslexia are significantly worse at producing the amplitude envelope of multi-syllabic targets compared to both age-matched and reading-level-matched control children. No group differences were found for pitch contour production between children with dyslexia and age-matched control children. It may be difficult to detect speech output problems in dyslexia as pitch contours are relatively accurate.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Giovanni M Di Liberto
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Fiona Gabrielczyk
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Angela Wilson
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Annabel Macfarlane
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
24
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
25
|
Di Liberto GM, Attaheri A, Cantisani G, Reilly RB, Ní Choisdealbha Á, Rocha S, Brusini P, Goswami U. Emergence of the cortical encoding of phonetic features in the first year of life. Nat Commun 2023; 14:7789. [PMID: 38040720 PMCID: PMC10692113 DOI: 10.1038/s41467-023-43490-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Even prior to producing their first words, infants are developing a sophisticated speech processing system, with robust word recognition present by 4-6 months of age. These emergent linguistic skills, observed with behavioural investigations, are likely to rely on increasingly sophisticated neural underpinnings. The infant brain is known to robustly track the speech envelope, however previous cortical tracking studies were unable to demonstrate the presence of phonetic feature encoding. Here we utilise temporal response functions computed from electrophysiological responses to nursery rhymes to investigate the cortical encoding of phonetic features in a longitudinal cohort of infants when aged 4, 7 and 11 months, as well as adults. The analyses reveal an increasingly detailed and acoustically invariant phonetic encoding emerging over the first year of life, providing neurophysiological evidence that the pre-verbal human cortex learns phonetic categories. By contrast, we found no credible evidence for age-related increases in cortical tracking of the acoustic spectrogram.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland.
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland.
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom.
| | - Adam Attaheri
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Giorgia Cantisani
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland
- Laboratoire des Systémes Perceptifs, Département d'études Cognitives, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| | - Richard B Reilly
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- School of Engineering, Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin., Dublin, Ireland
- School of Medicine, Trinity College, The University of Dublin, Dublin, Ireland
| | - Áine Ní Choisdealbha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Sinead Rocha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Perrine Brusini
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
26
|
Mai G, Wang WSY. Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing. Hum Brain Mapp 2023; 44:6149-6172. [PMID: 37818940 PMCID: PMC10619373 DOI: 10.1002/hbm.26503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 08/17/2023] [Accepted: 09/13/2023] [Indexed: 10/13/2023] Open
Abstract
The brain tracks and encodes multi-level speech features during spoken language processing. It is evident that this speech tracking is dominant at low frequencies (<8 Hz) including delta and theta bands. Recent research has demonstrated distinctions between delta- and theta-band tracking but has not elucidated how they differentially encode speech across linguistic levels. Here, we hypothesised that delta-band tracking encodes prediction errors (enhanced processing of unexpected features) while theta-band tracking encodes neural sharpening (enhanced processing of expected features) when people perceive speech with different linguistic contents. EEG responses were recorded when normal-hearing participants attended to continuous auditory stimuli that contained different phonological/morphological and semantic contents: (1) real-words, (2) pseudo-words and (3) time-reversed speech. We employed multivariate temporal response functions to measure EEG reconstruction accuracies in response to acoustic (spectrogram), phonetic and phonemic features with the partialling procedure that singles out unique contributions of individual features. We found higher delta-band accuracies for pseudo-words than real-words and time-reversed speech, especially during encoding of phonetic features. Notably, individual time-lag analyses showed that significantly higher accuracies for pseudo-words than real-words started at early processing stages for phonetic encoding (<100 ms post-feature) and later stages for acoustic and phonemic encoding (>200 and 400 ms post-feature, respectively). Theta-band accuracies, on the other hand, were higher when stimuli had richer linguistic content (real-words > pseudo-words > time-reversed speech). Such effects also started at early stages (<100 ms post-feature) during encoding of all individual features or when all features were combined. We argue these results indicate that delta-band tracking may play a role in predictive coding leading to greater tracking of pseudo-words due to the presence of unexpected/unpredicted semantic information, while theta-band tracking encodes sharpened signals caused by more expected phonological/morphological and semantic contents. Early presence of these effects reflects rapid computations of sharpening and prediction errors. Moreover, by measuring changes in EEG alpha power, we did not find evidence that the observed effects can be solitarily explained by attentional demands or listening efforts. Finally, we used directed information analyses to illustrate feedforward and feedback information transfers between prediction errors and sharpening across linguistic levels, showcasing how our results fit with the hierarchical Predictive Coding framework. Together, we suggest the distinct roles of delta and theta neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing.
Collapse
Affiliation(s)
- Guangting Mai
- Hearing Theme, National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham, UK
- Academic Unit of Mental Health and Clinical Neurosciences, School of Medicine, The University of Nottingham, Nottingham, UK
- Division of Psychology and Language Sciences, Faculty of Brain Sciences, University College London, London, UK
| | - William S-Y Wang
- Department of Chinese and Bilingual Studies, Hong Kong Polytechnic University, Hung Hom, Hong Kong
- Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
27
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
28
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
29
|
Puffay C, Vanthornhout J, Gillis M, Accou B, Van Hamme H, Francart T. Robust neural tracking of linguistic speech representations using a convolutional neural network. J Neural Eng 2023; 20:046040. [PMID: 37595606 DOI: 10.1088/1741-2552/acf1ce] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | | | - Marlies Gillis
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Bernd Accou
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
30
|
Kelty-Stephen DG, Lane E, Bloomfield L, Mangalam M. Multifractal test for nonlinearity of interactions across scales in time series. Behav Res Methods 2023; 55:2249-2282. [PMID: 35854196 DOI: 10.3758/s13428-022-01866-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2022] [Indexed: 01/21/2023]
Abstract
The creativity and emergence of biological and psychological behavior tend to be nonlinear, and correspondingly, biological and psychological measures contain degrees of irregularity. The linear model might fail to reduce these measurements to a sum of independent random factors (yielding a stable mean for the measurement), implying nonlinear changes over time. The present work reviews some of the concepts implicated in nonlinear changes over time and details the mathematical steps involved in their identification. It introduces multifractality as a mathematical framework helpful in determining whether and to what degree the measured series exhibits nonlinear changes over time. These mathematical steps include multifractal analysis and surrogate data production for resolving when multifractality entails nonlinear changes over time. Ultimately, when measurements fail to fit the structures of the traditional linear model, multifractal modeling allows for making those nonlinear excursions explicit, that is, to come up with a quantitative estimate of how strongly events may interact across timescales. This estimate may serve some interests as merely a potentially statistically significant indicator of independence failing to hold, but we suspect that this estimate might serve more generally as a predictor of perceptuomotor or cognitive performance.
Collapse
Affiliation(s)
| | - Elizabeth Lane
- Department of Psychiatry, University of California-San Diego, San Diego, CA, USA
| | | | - Madhur Mangalam
- Department of Physical Therapy, Movement and Rehabilitation Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
31
|
Tezcan F, Weissbart H, Martin AE. A tradeoff between acoustic and linguistic feature encoding in spoken language comprehension. eLife 2023; 12:e82386. [PMID: 37417736 PMCID: PMC10328533 DOI: 10.7554/elife.82386] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 06/18/2023] [Indexed: 07/08/2023] Open
Abstract
When we comprehend language from speech, the phase of the neural response aligns with particular features of the speech input, resulting in a phenomenon referred to as neural tracking. In recent years, a large body of work has demonstrated the tracking of the acoustic envelope and abstract linguistic units at the phoneme and word levels, and beyond. However, the degree to which speech tracking is driven by acoustic edges of the signal, or by internally-generated linguistic units, or by the interplay of both, remains contentious. In this study, we used naturalistic story-listening to investigate (1) whether phoneme-level features are tracked over and above acoustic edges, (2) whether word entropy, which can reflect sentence- and discourse-level constraints, impacted the encoding of acoustic and phoneme-level features, and (3) whether the tracking of acoustic edges was enhanced or suppressed during comprehension of a first language (Dutch) compared to a statistically familiar but uncomprehended language (French). We first show that encoding models with phoneme-level linguistic features, in addition to acoustic features, uncovered an increased neural tracking response; this signal was further amplified in a comprehended language, putatively reflecting the transformation of acoustic features into internally generated phoneme-level representations. Phonemes were tracked more strongly in a comprehended language, suggesting that language comprehension functions as a neural filter over acoustic edges of the speech signal as it transforms sensory signals into abstract linguistic units. We then show that word entropy enhances neural tracking of both acoustic and phonemic features when sentence- and discourse-context are less constraining. When language was not comprehended, acoustic features, but not phonemic ones, were more strongly modulated, but in contrast, when a native language is comprehended, phoneme features are more strongly modulated. Taken together, our findings highlight the flexible modulation of acoustic, and phonemic features by sentence and discourse-level constraint in language comprehension, and document the neural transformation from speech perception to language comprehension, consistent with an account of language processing as a neural filter from sensory to abstract representations.
Collapse
Affiliation(s)
- Filiz Tezcan
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| | - Andrea E Martin
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| |
Collapse
|
32
|
Oganian Y, Bhaya-Grossman I, Johnson K, Chang EF. Vowel and formant representation in the human auditory speech cortex. Neuron 2023; 111:2105-2118.e4. [PMID: 37105171 PMCID: PMC10330593 DOI: 10.1016/j.neuron.2023.04.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/08/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023]
Abstract
Vowels, a fundamental component of human speech across all languages, are cued acoustically by formants, resonance frequencies of the vocal tract shape during speaking. An outstanding question in neurolinguistics is how formants are processed neurally during speech perception. To address this, we collected high-density intracranial recordings from the human speech cortex on the superior temporal gyrus (STG) while participants listened to continuous speech. We found that two-dimensional receptive fields based on the first two formants provided the best characterization of vowel sound representation. Neural activity at single sites was highly selective for zones in this formant space. Furthermore, formant tuning is adjusted dynamically for speaker-specific spectral context. However, the entire population of formant-encoding sites was required to accurately decode single vowels. Overall, our results reveal that complex acoustic tuning in the two-dimensional formant space underlies local vowel representations in STG. As a population code, this gives rise to phonological vowel perception.
Collapse
Affiliation(s)
- Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA; University of California, Berkeley-University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA 94720, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
33
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
34
|
Krasnoff E, Chevalier G. Case report: binaural beats music assessment experiment. Front Hum Neurosci 2023; 17:1138650. [PMID: 37213931 PMCID: PMC10196448 DOI: 10.3389/fnhum.2023.1138650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/31/2023] [Indexed: 05/23/2023] Open
Abstract
We recruited subjects with the focus on people who were stressed and needed a break to experience relaxation. The study used inaudible binaural beats (BB) to measure the ability of BB to induce a relaxed state. We found through measuring brain wave activity that in fact BB seem to objectively induce a state of relaxation. We were able to see this across several scores, F3/F4 Alpha Assessment and CZ Theta Beta, calculated from EEG readings, that indicated an increase in positive outlook and a relaxing brain, respectively, and scalp topography maps. Most subjects also showed an improvement in Menlascan measurements of microcirculation or cardiovascular score, although the Menlascan scores and Big Five character assessment results were less conclusive. BB seem to have profound effects on the physiology of subjects and since the beats were not audible, these effects could not be attributed to the placebo effect. These results are encouraging in terms of developing musical products incorporating BB to affect human neural rhythms and corollary states of consciousness and warrant further research with more subjects and different frequencies of BB and different music tracks.
Collapse
Affiliation(s)
| | - Gaétan Chevalier
- Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
35
|
Keshishian M, Akkol S, Herrero J, Bickel S, Mehta AD, Mesgarani N. Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex. Nat Hum Behav 2023; 7:740-753. [PMID: 36864134 PMCID: PMC10417567 DOI: 10.1038/s41562-023-01520-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/05/2023] [Indexed: 03/04/2023]
Abstract
The precise role of the human auditory cortex in representing speech sounds and transforming them to meaning is not yet fully understood. Here we used intracranial recordings from the auditory cortex of neurosurgical patients as they listened to natural speech. We found an explicit, temporally ordered and anatomically distributed neural encoding of multiple linguistic features, including phonetic, prelexical phonotactics, word frequency, and lexical-phonological and lexical-semantic information. Grouping neural sites on the basis of their encoded linguistic features revealed a hierarchical pattern, with distinct representations of prelexical and postlexical features distributed across various auditory areas. While sites with longer response latencies and greater distance from the primary auditory cortex encoded higher-level linguistic features, the encoding of lower-level features was preserved and not discarded. Our study reveals a cumulative mapping of sound to meaning and provides empirical evidence for validating neurolinguistic and psycholinguistic models of spoken word recognition that preserve the acoustic variations in speech.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Serdar Akkol
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Jose Herrero
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Stephan Bickel
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Ashesh D Mehta
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
36
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
37
|
De Clercq P, Vanthornhout J, Vandermosten M, Francart T. Beyond linear neural envelope tracking: a mutual information approach. J Neural Eng 2023; 20. [PMID: 36812597 DOI: 10.1088/1741-2552/acbe1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023]
Abstract
Objective.The human brain tracks the temporal envelope of speech, which contains essential cues for speech understanding. Linear models are the most common tool to study neural envelope tracking. However, information on how speech is processed can be lost since nonlinear relations are precluded. Analysis based on mutual information (MI), on the other hand, can detect both linear and nonlinear relations and is gradually becoming more popular in the field of neural envelope tracking. Yet, several different approaches to calculating MI are applied with no consensus on which approach to use. Furthermore, the added value of nonlinear techniques remains a subject of debate in the field. The present paper aims to resolve these open questions.Approach.We analyzed electroencephalography (EEG) data of participants listening to continuous speech and applied MI analyses and linear models.Main results.Comparing the different MI approaches, we conclude that results are most reliable and robust using the Gaussian copula approach, which first transforms the data to standard Gaussians. With this approach, the MI analysis is a valid technique for studying neural envelope tracking. Like linear models, it allows spatial and temporal interpretations of speech processing, peak latency analyses, and applications to multiple EEG channels combined. In a final analysis, we tested whether nonlinear components were present in the neural response to the envelope by first removing all linear components in the data. We robustly detected nonlinear components on the single-subject level using the MI analysis.Significance.We demonstrate that the human brain processes speech in a nonlinear way. Unlike linear models, the MI analysis detects such nonlinear relations, proving its added value to neural envelope tracking. In addition, the MI analysis retains spatial and temporal characteristics of speech processing, an advantage lost when using more complex (nonlinear) deep neural networks.
Collapse
Affiliation(s)
- Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
38
|
Rimmele JM, Sun Y, Michalareas G, Ghitza O, Poeppel D. Dynamics of Functional Networks for Syllable and Word-Level Processing. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:120-144. [PMID: 37229144 PMCID: PMC10205074 DOI: 10.1162/nol_a_00089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 11/07/2022] [Indexed: 05/27/2023]
Abstract
Speech comprehension requires the ability to temporally segment the acoustic input for higher-level linguistic analysis. Oscillation-based approaches suggest that low-frequency auditory cortex oscillations track syllable-sized acoustic information and therefore emphasize the relevance of syllabic-level acoustic processing for speech segmentation. How syllabic processing interacts with higher levels of speech processing, beyond segmentation, including the anatomical and neurophysiological characteristics of the networks involved, is debated. In two MEG experiments, we investigate lexical and sublexical word-level processing and the interactions with (acoustic) syllable processing using a frequency-tagging paradigm. Participants listened to disyllabic words presented at a rate of 4 syllables/s. Lexical content (native language), sublexical syllable-to-syllable transitions (foreign language), or mere syllabic information (pseudo-words) were presented. Two conjectures were evaluated: (i) syllable-to-syllable transitions contribute to word-level processing; and (ii) processing of words activates brain areas that interact with acoustic syllable processing. We show that syllable-to-syllable transition information compared to mere syllable information, activated a bilateral superior, middle temporal and inferior frontal network. Lexical content resulted, additionally, in increased neural activity. Evidence for an interaction of word- and acoustic syllable-level processing was inconclusive. Decreases in syllable tracking (cerebroacoustic coherence) in auditory cortex and increases in cross-frequency coupling between right superior and middle temporal and frontal areas were found when lexical content was present compared to all other conditions; however, not when conditions were compared separately. The data provide experimental insight into how subtle and sensitive syllable-to-syllable transition information for word-level processing is.
Collapse
Affiliation(s)
- Johanna M. Rimmele
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Max Planck NYU Center for Language, Music and Emotion, Frankfurt am Main, Germany; New York, NY, USA
| | - Yue Sun
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Georgios Michalareas
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Oded Ghitza
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- College of Biomedical Engineering & Hearing Research Center, Boston University, Boston, MA, USA
| | - David Poeppel
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology and Center for Neural Science, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music and Emotion, Frankfurt am Main, Germany; New York, NY, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt am Main, Germany
| |
Collapse
|
39
|
Gillis M, Kries J, Vandermosten M, Francart T. Neural tracking of linguistic and acoustic speech representations decreases with advancing age. Neuroimage 2023; 267:119841. [PMID: 36584758 PMCID: PMC9878439 DOI: 10.1016/j.neuroimage.2022.119841] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 12/21/2022] [Accepted: 12/26/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Older adults process speech differently, but it is not yet clear how aging affects different levels of processing natural, continuous speech, both in terms of bottom-up acoustic analysis and top-down generation of linguistic-based predictions. We studied natural speech processing across the adult lifespan via electroencephalography (EEG) measurements of neural tracking. GOALS Our goals are to analyze the unique contribution of linguistic speech processing across the adult lifespan using natural speech, while controlling for the influence of acoustic processing. Moreover, we also studied acoustic processing across age. In particular, we focus on changes in spatial and temporal activation patterns in response to natural speech across the lifespan. METHODS 52 normal-hearing adults between 17 and 82 years of age listened to a naturally spoken story while the EEG signal was recorded. We investigated the effect of age on acoustic and linguistic processing of speech. Because age correlated with hearing capacity and measures of cognition, we investigated whether the observed age effect is mediated by these factors. Furthermore, we investigated whether there is an effect of age on hemisphere lateralization and on spatiotemporal patterns of the neural responses. RESULTS Our EEG results showed that linguistic speech processing declines with advancing age. Moreover, as age increased, the neural response latency to certain aspects of linguistic speech processing increased. Also acoustic neural tracking (NT) decreased with increasing age, which is at odds with the literature. In contrast to linguistic processing, older subjects showed shorter latencies for early acoustic responses to speech. No evidence was found for hemispheric lateralization in neither younger nor older adults during linguistic speech processing. Most of the observed aging effects on acoustic and linguistic processing were not explained by age-related decline in hearing capacity or cognition. However, our results suggest that the effect of decreasing linguistic neural tracking with advancing age at word-level is also partially due to an age-related decline in cognition than a robust effect of age. CONCLUSION Spatial and temporal characteristics of the neural responses to continuous speech change across the adult lifespan for both acoustic and linguistic speech processing. These changes may be traces of structural and/or functional change that occurs with advancing age.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jill Kries
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
40
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | |
Collapse
|
41
|
Chalas N, Daube C, Kluger DS, Abbasi O, Nitsch R, Gross J. Speech onsets and sustained speech contribute differentially to delta and theta speech tracking in auditory cortex. Cereb Cortex 2023; 33:6273-6281. [PMID: 36627246 DOI: 10.1093/cercor/bhac502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 01/12/2023] Open
Abstract
When we attentively listen to an individual's speech, our brain activity dynamically aligns to the incoming acoustic input at multiple timescales. Although this systematic alignment between ongoing brain activity and speech in auditory brain areas is well established, the acoustic events that drive this phase-locking are not fully understood. Here, we use magnetoencephalographic recordings of 24 human participants (12 females) while they were listening to a 1 h story. We show that whereas speech-brain coupling is associated with sustained acoustic fluctuations in the speech envelope in the theta-frequency range (4-7 Hz), speech tracking in the low-frequency delta (below 1 Hz) was strongest around onsets of speech, like the beginning of a sentence. Crucially, delta tracking in bilateral auditory areas was not sustained after onsets, proposing a delta tracking during continuous speech perception that is driven by speech onsets. We conclude that both onsets and sustained components of speech contribute differentially to speech tracking in delta- and theta-frequency bands, orchestrating sampling of continuous speech. Thus, our results suggest a temporal dissociation of acoustically driven oscillatory activity in auditory areas during speech tracking, providing valuable implications for orchestration of speech tracking at multiple time scales.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany.,Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Christoph Daube
- Centre for Cognitive Neuroimaging, University of Glasgow, 56-64 Hillhead Street, G12 8QB, Glasgow, United Kingdom
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| |
Collapse
|
42
|
Schyns PG, Snoek L, Daube C. Degrees of algorithmic equivalence between the brain and its DNN models. Trends Cogn Sci 2022; 26:1090-1102. [PMID: 36216674 DOI: 10.1016/j.tics.2022.09.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 11/11/2022]
Abstract
Deep neural networks (DNNs) have become powerful and increasingly ubiquitous tools to model human cognition, and often produce similar behaviors. For example, with their hierarchical, brain-inspired organization of computations, DNNs apparently categorize real-world images in the same way as humans do. Does this imply that their categorization algorithms are also similar? We have framed the question with three embedded degrees that progressively constrain algorithmic similarity evaluations: equivalence of (i) behavioral/brain responses, which is current practice, (ii) the stimulus features that are processed to produce these outcomes, which is more constraining, and (iii) the algorithms that process these shared features, the ultimate goal. To improve DNNs as models of cognition, we develop for each degree an increasingly constrained benchmark that specifies the epistemological conditions for the considered equivalence.
Collapse
Affiliation(s)
- Philippe G Schyns
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK.
| | - Lukas Snoek
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK
| | - Christoph Daube
- School of Psychology and Neuroscience, University of Glasgow, Glasgow G12 8QB, UK
| |
Collapse
|
43
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
44
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
45
|
Fomins A, Sych Y, Helmchen F. Conservative significance testing of tripartite statistical relations in multivariate neural data. Netw Neurosci 2022; 6:1243-1274. [PMID: 38800452 PMCID: PMC11117094 DOI: 10.1162/netn_a_00259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 06/14/2022] [Indexed: 05/29/2024] Open
Abstract
An important goal in systems neuroscience is to understand the structure of neuronal interactions, frequently approached by studying functional relations between recorded neuronal signals. Commonly used pairwise measures (e.g., correlation coefficient) offer limited insight, neither addressing the specificity of estimated neuronal interactions nor potential synergistic coupling between neuronal signals. Tripartite measures, such as partial correlation, variance partitioning, and partial information decomposition, address these questions by disentangling functional relations into interpretable information atoms (unique, redundant, and synergistic). Here, we apply these tripartite measures to simulated neuronal recordings to investigate their sensitivity to noise. We find that the considered measures are mostly accurate and specific for signals with noiseless sources but experience significant bias for noisy sources.We show that permutation testing of such measures results in high false positive rates even for small noise fractions and large data sizes. We present a conservative null hypothesis for significance testing of tripartite measures, which significantly decreases false positive rate at a tolerable expense of increasing false negative rate. We hope our study raises awareness about the potential pitfalls of significance testing and of interpretation of functional relations, offering both conceptual and practical advice.
Collapse
Affiliation(s)
- Aleksejs Fomins
- Brain Research Institute, University of Zurich, Zurich, Switzerland
- Neuroscience Center Zurich, University of Zurich, Switzerland
| | - Yaroslav Sych
- Brain Research Institute, University of Zurich, Zurich, Switzerland
- Experimental Neurology Center, Department of Neurology, Inselspital University Hospital Bern, Bern, Switzerland
- Present address: Institute of Cellular and Integrative Neurosciences, University of Strasbourg and CNRS, Strasbourg, France
| | - Fritjof Helmchen
- Brain Research Institute, University of Zurich, Zurich, Switzerland
- Neuroscience Center Zurich, University of Zurich, Switzerland
| |
Collapse
|
46
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
47
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
48
|
Weineck K, Wen OX, Henry MJ. Neural synchronization is strongest to the spectral flux of slow music and depends on familiarity and beat salience. eLife 2022; 11:e75515. [PMID: 36094165 PMCID: PMC9467512 DOI: 10.7554/elife.75515] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Neural activity in the auditory system synchronizes to sound rhythms, and brain-environment synchronization is thought to be fundamental to successful auditory perception. Sound rhythms are often operationalized in terms of the sound's amplitude envelope. We hypothesized that - especially for music - the envelope might not best capture the complex spectro-temporal fluctuations that give rise to beat perception and synchronized neural activity. This study investigated (1) neural synchronization to different musical features, (2) tempo-dependence of neural synchronization, and (3) dependence of synchronization on familiarity, enjoyment, and ease of beat perception. In this electroencephalography study, 37 human participants listened to tempo-modulated music (1-4 Hz). Independent of whether the analysis approach was based on temporal response functions (TRFs) or reliable components analysis (RCA), the spectral flux of music - as opposed to the amplitude envelope - evoked strongest neural synchronization. Moreover, music with slower beat rates, high familiarity, and easy-to-perceive beats elicited the strongest neural response. Our results demonstrate the importance of spectro-temporal fluctuations in music for driving neural synchronization, and highlight its sensitivity to musical tempo, familiarity, and beat salience.
Collapse
Affiliation(s)
- Kristin Weineck
- Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical AestheticsFrankfurt am MainGermany
- Goethe University Frankfurt, Institute for Cell Biology and NeuroscienceFrankfurt am MainGermany
| | - Olivia Xin Wen
- Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical AestheticsFrankfurt am MainGermany
| | - Molly J Henry
- Research Group “Neural and Environmental Rhythms”, Max Planck Institute for Empirical AestheticsFrankfurt am MainGermany
- Department of Psychology, Toronto Metropolitan UniversityTorontoCanada
| |
Collapse
|
49
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 PMCID: PMC9371745 DOI: 10.1073/pnas.2201968119] [Citation(s) in RCA: 110] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P. de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
50
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 DOI: 10.1101/2020.12.03.410399] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|