1
|
Teng X, Larrouy-Maestri P, Poeppel D. Segmenting and Predicting Musical Phrase Structure Exploits Neural Gain Modulation and Phase Precession. J Neurosci 2024; 44:e1331232024. [PMID: 38926087 PMCID: PMC11270514 DOI: 10.1523/jneurosci.1331-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 05/29/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Music, like spoken language, is often characterized by hierarchically organized structure. Previous experiments have shown neural tracking of notes and beats, but little work touches on the more abstract question: how does the brain establish high-level musical structures in real time? We presented Bach chorales to participants (20 females and 9 males) undergoing electroencephalogram (EEG) recording to investigate how the brain tracks musical phrases. We removed the main temporal cues to phrasal structures, so that listeners could only rely on harmonic information to parse a continuous musical stream. Phrasal structures were disrupted by locally or globally reversing the harmonic progression, so that our observations on the original music could be controlled and compared. We first replicated the findings on neural tracking of musical notes and beats, substantiating the positive correlation between musical training and neural tracking. Critically, we discovered a neural signature in the frequency range ∼0.1 Hz (modulations of EEG power) that reliably tracks musical phrasal structure. Next, we developed an approach to quantify the phrasal phase precession of the EEG power, revealing that phrase tracking is indeed an operation of active segmentation involving predictive processes. We demonstrate that the brain establishes complex musical structures online over long timescales (>5 s) and actively segments continuous music streams in a manner comparable to language processing. These two neural signatures, phrase tracking and phrasal phase precession, provide new conceptual and technical tools to study the processes underpinning high-level structure building using noninvasive recording techniques.
Collapse
Affiliation(s)
- Xiangbin Teng
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Pauline Larrouy-Maestri
- Music Department, Max-Planck-Institute for Empirical Aesthetics, Frankfurt 60322, Germany
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
| | - David Poeppel
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
- Department of Psychology, New York University, New York, New York 10003
- Ernst Struengmann Institute for Neuroscience, Frankfurt 60528, Germany
- Music and Audio Research Laboratory (MARL), New York, New York 11201
| |
Collapse
|
2
|
Nora A, Rinkinen O, Renvall H, Service E, Arkkila E, Smolander S, Laasonen M, Salmelin R. Impaired Cortical Tracking of Speech in Children with Developmental Language Disorder. J Neurosci 2024; 44:e2048232024. [PMID: 38589232 PMCID: PMC11140678 DOI: 10.1523/jneurosci.2048-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
In developmental language disorder (DLD), learning to comprehend and express oneself with spoken language is impaired, but the reason for this remains unknown. Using millisecond-scale magnetoencephalography recordings combined with machine learning models, we investigated whether the possible neural basis of this disruption lies in poor cortical tracking of speech. The stimuli were common spoken Finnish words (e.g., dog, car, hammer) and sounds with corresponding meanings (e.g., dog bark, car engine, hammering). In both children with DLD (10 boys and 7 girls) and typically developing (TD) control children (14 boys and 3 girls), aged 10-15 years, the cortical activation to spoken words was best modeled as time-locked to the unfolding speech input at ∼100 ms latency between sound and cortical activation. Amplitude envelope (amplitude changes) and spectrogram (detailed time-varying spectral content) of the spoken words, but not other sounds, were very successfully decoded based on time-locked brain responses in bilateral temporal areas; based on the cortical responses, the models could tell at ∼75-85% accuracy which of the two sounds had been presented to the participant. However, the cortical representation of the amplitude envelope information was poorer in children with DLD compared with TD children at longer latencies (at ∼200-300 ms lag). We interpret this effect as reflecting poorer retention of acoustic-phonetic information in short-term memory. This impaired tracking could potentially affect the processing and learning of words as well as continuous speech. The present results offer an explanation for the problems in language comprehension and acquisition in DLD.
Collapse
Affiliation(s)
- Anni Nora
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Oona Rinkinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
- BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, Helsinki FI-00029, Finland
| | - Elisabet Service
- Department of Linguistics and Languages, Centre for Advanced Research in Experimental and Applied Linguistics (ARiEAL), McMaster University, Hamilton, Ontario L8S 4L8, Canada
- Department of Psychology and Logopedics, University of Helsinki, Helsinki FI-00014, Finland
| | - Eva Arkkila
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
| | - Sini Smolander
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Research Unit of Logopedics, University of Oulu, Oulu FI-90014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Marja Laasonen
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| |
Collapse
|
3
|
Ni G, Xu Z, Bai Y, Zheng Q, Zhao R, Wu Y, Ming D. EEG-based assessment of temporal fine structure and envelope effect in mandarin syllable and tone perception. Cereb Cortex 2023; 33:11287-11299. [PMID: 37804238 DOI: 10.1093/cercor/bhad366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 10/09/2023] Open
Abstract
In recent years, speech perception research has benefited from low-frequency rhythm entrainment tracking of the speech envelope. However, speech perception is still controversial regarding the role of speech envelope and temporal fine structure, especially in Mandarin. This study aimed to discuss the dependence of Mandarin syllables and tones perception on the speech envelope and the temporal fine structure. We recorded the electroencephalogram (EEG) of the subjects under three acoustic conditions using the sound chimerism analysis, including (i) the original speech, (ii) the speech envelope and the sinusoidal modulation, and (iii) the fine structure of time and the modulation of the non-speech (white noise) sound envelope. We found that syllable perception mainly depended on the speech envelope, while tone perception depended on the temporal fine structure. The delta bands were prominent, and the parietal and prefrontal lobes were the main activated brain areas, regardless of whether syllable or tone perception was involved. Finally, we decoded the spatiotemporal features of Mandarin perception from the microstate sequence. The spatiotemporal feature sequence of the EEG caused by speech material was found to be specific, suggesting a new perspective for the subsequent auditory brain-computer interface. These results provided a new scheme for the coding strategy of new hearing aids for native Mandarin speakers. HIGHLIGHTS
Collapse
Affiliation(s)
- Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| | - Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Qi Zheng
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yubo Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| |
Collapse
|
4
|
Hoshino H, Shiga T, Mori Y, Nozaki M, Kanno K, Osakabe Y, Ochiai H, Wada T, Hikita M, Itagaki S, Miura I, Yabe H. Effect of the Temporal Window of Integration of Speech Sound on Mismatch Negativity. Clin EEG Neurosci 2023; 54:620-627. [PMID: 35410509 DOI: 10.1177/15500594221093607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Speech-sound stimuli have a complex structure, and it is unclear how the brain processes them. An event-related potential (ERP), known as mismatch negativity (MMN), is elicited when an individual's brain detects a rare sound. In this study, MMNs were measured in response to an omitted segment of a complex sound consisting of a Japanese vowel. The results indicated that the latency from onset in the right hemisphere was significantly shorter than that in the frontal midline and left hemispheres during left ear stimulation. Additionally, the results of latency from omission showed that the latency of stimuli omitted in the latter part of the temporal window of integration (TWI) was longer than that of stimuli omitted in the first part of the TWI. The mean peak amplitude was found to be higher in the right hemisphere than in the frontal midline and left hemispheres in response to left ear stimulation. In conclusion, the results of this study suggest that would be incorrect to believe that the stimuli have strictly the characteristics of speech-sound. However. the results of the interaction effect in the latencies from omission were insignificant. These results suggest that the detection time for deviance may not be related to the stimulus ear. However, the type of deviant stimuli on latencies was found to be significant. This is because the detection of the deviants was delayed when a deviation occurred in the latter part of the TWI, regardless of the stimulation of the ear.
Collapse
Affiliation(s)
- Hiroshi Hoshino
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Tetsuya Shiga
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Yuhei Mori
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Michinari Nozaki
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Kazuko Kanno
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Yusuke Osakabe
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Haruka Ochiai
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Tomohiro Wada
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Masayuki Hikita
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Shuntaro Itagaki
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Itaru Miura
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| | - Hirooki Yabe
- Department of Neuropsychiatry, Fukushima Medical University, Hikarigaoka, Fukushima-city, Fukushima, 960-1295, Japan
| |
Collapse
|
5
|
Ueda K, Doan LLD, Takeichi H. Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Linh Le Dieu Doan
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
6
|
Windle R, Dillon H, Heinrich A. A review of auditory processing and cognitive change during normal ageing, and the implications for setting hearing aids for older adults. Front Neurol 2023; 14:1122420. [PMID: 37409017 PMCID: PMC10318159 DOI: 10.3389/fneur.2023.1122420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Throughout our adult lives there is a decline in peripheral hearing, auditory processing and elements of cognition that support listening ability. Audiometry provides no information about the status of auditory processing and cognition, and older adults often struggle with complex listening situations, such as speech in noise perception, even if their peripheral hearing appears normal. Hearing aids can address some aspects of peripheral hearing impairment and improve signal-to-noise ratios. However, they cannot directly enhance central processes and may introduce distortion to sound that might act to undermine listening ability. This review paper highlights the need to consider the distortion introduced by hearing aids, specifically when considering normally-ageing older adults. We focus on patients with age-related hearing loss because they represent the vast majority of the population attending audiology clinics. We believe that it is important to recognize that the combination of peripheral and central, auditory and cognitive decline make older adults some of the most complex patients seen in audiology services, so they should not be treated as "standard" despite the high prevalence of age-related hearing loss. We argue that a primary concern should be to avoid hearing aid settings that introduce distortion to speech envelope cues, which is not a new concept. The primary cause of distortion is the speed and range of change to hearing aid amplification (i.e., compression). We argue that slow-acting compression should be considered as a default for some users and that other advanced features should be reconsidered as they may also introduce distortion that some users may not be able to tolerate. We discuss how this can be incorporated into a pragmatic approach to hearing aid fitting that does not require increased loading on audiology services.
Collapse
Affiliation(s)
- Richard Windle
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - Harvey Dillon
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Department of Linguistics, Macquarie University, North Ryde, NSW, Australia
| | - Antje Heinrich
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Division of Human Communication, Development and Hearing, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
7
|
Adolfi F, Bowers JS, Poeppel D. Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Netw 2023; 162:199-211. [PMID: 36913820 DOI: 10.1016/j.neunet.2023.02.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 02/15/2023] [Accepted: 02/21/2023] [Indexed: 03/15/2023]
Abstract
Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.
Collapse
Affiliation(s)
- Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; University of Bristol, School of Psychological Science, Bristol, United Kingdom.
| | - Jeffrey S Bowers
- University of Bristol, School of Psychological Science, Bristol, United Kingdom
| | - David Poeppel
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; Department of Psychology, New York University, NY, United States; Max Planck NYU Center for Language, Music, and Emotion, Frankfurt, Germany, New York, NY, United States
| |
Collapse
|
8
|
Adolfi F, Wareham T, van Rooij I. A Computational Complexity Perspective on Segmentation as a Cognitive Subcomputation. Top Cogn Sci 2022; 15:255-273. [PMID: 36453947 DOI: 10.1111/tops.12629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 12/05/2022]
Abstract
Computational feasibility is a widespread concern that guides the framing and modeling of natural and artificial intelligence. The specification of cognitive system capacities is often shaped by unexamined intuitive assumptions about the search space and complexity of a subcomputation. However, a mistaken intuition might make such initial conceptualizations misleading for what empirical questions appear relevant later on. We undertake here computational-level modeling and complexity analyses of segmentation - a widely hypothesized subcomputation that plays a requisite role in explanations of capacities across domains, such as speech recognition, music cognition, active sensing, event memory, action parsing, and statistical learning - as a case study to show how crucial it is to formally assess these assumptions. We mathematically prove two sets of results regarding computational hardness and search space size that may run counter to intuition, and position their implications with respect to existing views on the subcapacity.
Collapse
Affiliation(s)
- Federico Adolfi
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max‐Planck Society
- School of Psychological Science University of Bristol
| | - Todd Wareham
- Department of Computer Science Memorial University of Newfoundland
| | - Iris van Rooij
- Donders Institute for Brain, Cognition, and Behaviour Radboud University
- School of Artificial Intelligence Radboud University
- Department of Linguistics, Cognitive Science, and Semiotics & Interacting Minds Centre Aarhus University
| |
Collapse
|
9
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
10
|
Eguchi H, Ueda K, Remijn GB, Nakajima Y, Takeichi H. The common limitations in auditory temporal processing for Mandarin Chinese and Japanese. Sci Rep 2022; 12:3002. [PMID: 35194098 PMCID: PMC8863933 DOI: 10.1038/s41598-022-06925-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/09/2022] [Indexed: 11/09/2022] Open
Abstract
The present investigation focused on how temporal degradation affected intelligibility in two types of languages, i.e., a tonal language (Mandarin Chinese) and a non-tonal language (Japanese). The temporal resolution of common daily-life sentences spoken by native speakers was systematically degraded with mosaicking (mosaicising), in which the power of original speech in each of regularly spaced time-frequency unit was averaged and temporal fine structure was removed. The results showed very similar patterns of variations in intelligibility for these two languages over a wide range of temporal resolution, implying that temporal degradation crucially affected speech cues other than tonal cues in degraded speech without temporal fine structure. Specifically, the intelligibility of both languages maintained a ceiling up to about the 40-ms segment duration, then the performance gradually declined with increasing segment duration, and reached a floor at about the 150-ms segment duration or longer. The same limitations for the ceiling performance up to 40 ms appeared for the other method of degradation, i.e., local time-reversal, implying that a common temporal processing mechanism was related to the limitations. The general tendency fitted to a dual time-window model of speech processing, in which a short (~ 20-30 ms) and a long (~ 200 ms) time-window run in parallel.
Collapse
Affiliation(s)
- Hikaru Eguchi
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan
| | - Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan.
| | - Gerard B Remijn
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan
| | - Yoshitaka Nakajima
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan
- Sound Corporation, 4-10-30-103, Tonoharu, Higashiku, Fukuoka, 813-0001, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|
11
|
Zheng Y, Liu L, Li R, Wu Z, Chen L, Li J, Wu C, Kong L, Zhang C, Lei M, She S, Ning Y, Li L. Impaired interaural correlation processing in people with schizophrenia. Eur J Neurosci 2021; 54:6646-6662. [PMID: 34494695 DOI: 10.1111/ejn.15449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/19/2021] [Accepted: 09/03/2021] [Indexed: 01/05/2023]
Abstract
Detection of transient changes in interaural correlation is based on the temporal precision of the central representations of acoustic signals. Whether schizophrenia impairs the temporal precision in the interaural correlation process is not clear. In both participants with schizophrenia and matched healthy-control participants, this study examined the detection of a break in interaural correlation (BIC, a change in interaural correlation from 1 to 0 and back to 1), including the longest interaural delay at which a BIC was just audible, representing the temporal extent of the primitive auditory memory (PAM). Moreover, BIC-induced electroencephalograms (EEGs) and the relationships between the early binaural psychoacoustic processing and higher cognitive functions, which were assessed by the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), were examined. The results showed that compared to healthy controls, participants with schizophrenia exhibited poorer BIC detection, PAM and RBANS score. Both the BIC-detection accuracy and the PAM extent were correlated with the RBANS score. Moreover, participants with schizophrenia showed weaker BIC-induced N1-P2 amplitude which was correlated with both theta-band power and inter-trial phase coherence. These results suggested that schizophrenia impairs the temporal precision of the central representations of acoustic signals, affecting both interaural correlation processing and higher-order cognitions.
Collapse
Affiliation(s)
- Yingjun Zheng
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Lei Liu
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Ruikeng Li
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zhemeng Wu
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Liangjie Chen
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Juanhua Li
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Chao Wu
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Lingzhi Kong
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Changxin Zhang
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Ming Lei
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Shenglin She
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yuping Ning
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Liang Li
- School of Psychological and Cognitive Sciences, Key Laboratory on Machine Perception (Ministry of Education), Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| |
Collapse
|
12
|
Ueda K, Matsuo I. Intelligibility of chimeric locally time-reversed speech: Relative contribution of four frequency bands. JASA EXPRESS LETTERS 2021; 1:065201. [PMID: 36154368 DOI: 10.1121/10.0005439] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Intelligibility of four-band speech stimuli was investigated (n = 18), such that only one of the frequency bands was preserved, whereas other bands were locally time-reversed (segment duration: 75-300 ms), or vice versa. Intelligibility was best retained (82% at 75 ms) when the second lowest band (540-1700 Hz) was preserved. When the same band was degraded, the largest drop (10% at 300 ms) occurred. The lowest and second highest bands contributed similarly less strongly to intelligibility. The highest frequency band contributed least. A close connection between the second lowest frequency band and sonority was suggested.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Ikuo Matsuo
- Department of Information Science, Tohoku Gakuin University, 2-1-1 Tenjinzawa, Izumi-ku, Sendai 981-3193, Japan ,
| |
Collapse
|
13
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
14
|
Phonemic restoration of interrupted locally time-reversed speech : Effects of segment duration and noise levels. Atten Percept Psychophys 2021; 83:1928-1934. [PMID: 33851359 PMCID: PMC8213671 DOI: 10.3758/s13414-021-02292-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 11/08/2022]
Abstract
Intelligibility of temporally degraded speech was investigated with locally time-reversed speech (LTR) and its interrupted version (ILTR). Control stimuli comprising interrupted speech (I) were also included. Speech stimuli consisted of 200 Japanese meaningful sentences. In interrupted stimuli, speech segments were alternated with either silent gaps or pink noise bursts. The noise bursts had a level of - 10, 0 or + 10 dB relative to the speech level. Segment duration varied from 20 to 160 ms for ILTR sentences, but was fixed at 160 ms for I sentences. At segment durations between 40 and 80 ms, severe reductions in intelligibility were observed for ILTR sentences, compared with LTR sentences. A substantial improvement in intelligibility (30-33%) was observed when 40-ms silent gaps in ILTR were replaced with 0- and + 10-dB noise. Noise with a level of - 10 dB had no effect on the intelligibility. These findings show that the combined effects of interruptions and temporal reversal of speech segments on intelligibility are greater than the sum of each individual effect. The results also support the idea that illusory continuity induced by high-level noise bursts improves the intelligibility of ILTR and I sentences.
Collapse
|
15
|
Kent L, Wittmann M. Special Issue: Consciousness science and its theories Time consciousness: the missing link in theories of consciousness. Neurosci Conscious 2021; 2021:niab011. [PMID: 33868714 PMCID: PMC8042366 DOI: 10.1093/nc/niab011] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/08/2021] [Accepted: 03/16/2021] [Indexed: 11/26/2022] Open
Abstract
There are plenty of issues to be solved in order for researchers to agree on a neural model of consciousness. Here we emphasize an often under-represented aspect in the debate: time consciousness. Consciousness and the present moment both extend in time. Experience flows through a succession of moments and progresses from future predictions, to present experiences, to past memories. However, a brief review finds that many dominant theories of consciousness only refer to brief, static, and discrete "functional moments" of time. Very few refer to more extended, dynamic, and continuous time, which is associated with conscious experience (cf. the "experienced moment"). This confusion between short and discrete versus long and continuous is, we argue, one of the core issues in theories of consciousness. Given the lack of work dedicated to time consciousness, its study could test novel predictions of rival theories of consciousness. It may be that different theories of consciousness are compatible/complementary if the different aspects of time are taken into account. Or, if it turns out that no existing theory can fully accommodate time consciousness, then perhaps it has something new to add. Regardless of outcome, the crucial step is to make subjective time a central object of study.
Collapse
Affiliation(s)
- Lachlan Kent
- Centre for Youth Mental Health, The University of Melbourne, 35 Poplar Rd, Parkville, Victoria 3052, Australia
- Orygen, 35 Poplar Rd, Parkville, Victoria 3052, Australia
| | - Marc Wittmann
- Institute for Frontier Areas of Psychology and Mental Health, Wilhelmstraße 3a, 79098 Freiburg i.Br., Germany
| |
Collapse
|
16
|
Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions. eNeuro 2021; 8:ENEURO.0399-20.2020. [PMID: 33272971 PMCID: PMC7810259 DOI: 10.1523/eneuro.0399-20.2020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/08/2020] [Accepted: 11/14/2020] [Indexed: 11/26/2022] Open
Abstract
Speech signals have a unique shape of long-term modulation spectrum that is distinct from environmental noise, music, and non-speech vocalizations. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we tested whether neural responses to speech signals can be captured by specific modulation spectra of non-speech acoustic stimuli. We generated amplitude modulated (AM) noise with the speech modulation spectrum and 1/f modulation spectra of different exponents to imitate temporal dynamics of different natural sounds. We presented these AM stimuli and a 10-min piece of natural speech to 19 human participants undergoing electroencephalography (EEG) recording. We derived temporal response functions (TRFs) to the AM stimuli of different spectrum shapes and found distinct neural dynamics for each type of TRFs. We then used the TRFs of AM stimuli to predict neural responses to the speech signals, and found that (1) the TRFs of AM modulation spectra of exponents 1, 1.5, and 2 preferably captured EEG responses to speech signals in the δ band and (2) the θ neural band of speech neural responses can be captured by the AM stimuli of an exponent of 0.75. Our results suggest that the human auditory system shows specificity to the long-term modulation spectrum and is equipped with characteristic neural algorithms tailored to extract critical acoustic information from speech signals.
Collapse
|
17
|
Wang L, Wu EX, Chen F. Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions. Front Hum Neurosci 2020; 14:557534. [PMID: 33132874 PMCID: PMC7576187 DOI: 10.3389/fnhum.2020.557534] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/09/2020] [Indexed: 11/25/2022] Open
Abstract
The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
18
|
Mourão GL, Costa MH, Paul S. Speech Intelligibility for Cochlear Implant Users with the MMSE Noise-Reduction Time-Frequency Mask. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2020.101982] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|