1
|
Cho S, Olm CA, Ash S, Shellikeri S, Agmon G, Cousins KAQ, Irwin DJ, Grossman M, Liberman M, Nevler N. Automatic classification of AD pathology in FTD phenotypes using natural speech. Alzheimers Dement 2024. [PMID: 38572850 DOI: 10.1002/alz.13748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 04/05/2024]
Abstract
INTRODUCTION Screening for Alzheimer's disease neuropathologic change (ADNC) in individuals with atypical presentations is challenging but essential for clinical management. We trained automatic speech-based classifiers to distinguish frontotemporal dementia (FTD) patients with ADNC from those with frontotemporal lobar degeneration (FTLD). METHODS We trained automatic classifiers with 99 speech features from 1 minute speech samples of 179 participants (ADNC = 36, FTLD = 60, healthy controls [HC] = 89). Patients' pathology was assigned based on autopsy or cerebrospinal fluid analytes. Structural network-based magnetic resonance imaging analyses identified anatomical correlates of distinct speech features. RESULTS Our classifier showed 0.88 ± $ \pm $ 0.03 area under the curve (AUC) for ADNC versus FTLD and 0.93 ± $ \pm $ 0.04 AUC for patients versus HC. Noun frequency and pause rate correlated with gray matter volume loss in the limbic and salience networks, respectively. DISCUSSION Brief naturalistic speech samples can be used for screening FTD patients for underlying ADNC in vivo. This work supports the future development of digital assessment tools for FTD. HIGHLIGHTS We trained machine learning classifiers for frontotemporal dementia patients using natural speech. We grouped participants by neuropathological diagnosis (autopsy) or cerebrospinal fluid biomarkers. Classifiers well distinguished underlying pathology (Alzheimer's disease vs. frontotemporal lobar degeneration) in patients. We identified important features through an explainable artificial intelligence approach. This work lays the groundwork for a speech-based neuropathology screening tool.
Collapse
Affiliation(s)
- Sunghye Cho
- Linguistic Data Consortium, Department of Linguistics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher A Olm
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Sharon Ash
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Sanjana Shellikeri
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Galit Agmon
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Katheryn A Q Cousins
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - David J Irwin
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Murray Grossman
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Mark Liberman
- Linguistic Data Consortium, Department of Linguistics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Naomi Nevler
- Penn Frontotemporal Degeneration Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
2
|
Cao L, Han K, Lin L, Hing J, Ooi V, Huang N, Yu J, Ng TKS, Feng L, Mahendran R, Kua EH, Bao Z. Reversal of the concreteness effect can be detected in the natural speech of older adults with amnestic, but not non-amnestic, mild cognitive impairment. Alzheimers Dement (Amst) 2024; 16:e12588. [PMID: 38638800 PMCID: PMC11024957 DOI: 10.1002/dad2.12588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 04/20/2024]
Abstract
INTRODUCTION Patients with Alzheimer's disease present with difficulty in lexical retrieval and reversal of the concreteness effect in nouns. Little is known about the phenomena before the onset of symptoms. We anticipate early linguistic signs in the speech of people who suffer from amnestic mild cognitive impairment (MCI). Here, we report the results of a corpus-linguistic approach to the early detection of cognitive impairment. METHODS One hundred forty-eight English-speaking Singaporeans provided natural speech data, on topics of their choice; 74 were diagnosed with single-domain MCI (38 amnestic, 36 non-amnestic), 74 cognitively healthy. The recordings yield 267,310 words, which are tagged for parts of speech. We calculate the per-minute word counts and concreteness scores of all tagged words, nouns, and verbs in the dataset. RESULTS Compared to controls, subjects with amnestic MCI produce fewer but more abstract nouns. Verbs are not affected. DISCUSSION Slower retrieval of nouns and the reversal of the concreteness effect in nouns are manifested in natural speech and can be detected early through corpus-based analysis. Highlights Reversal of the concreteness effect is manifested in patients with Alzheimer's disease (AD) and semantic dementia.The paper reports a corpus-based analysis of natural speech by people with amnestic and non-amnestic mild cognitive impairment (MCI) and cognitively healthy controls.People with amnestic MCI produce fewer and more abstract nouns than people with non-amnestic MCI and healthy controls. Verbs appear to be unaffected.The imageability problem can be detected in natural everyday speech by people with amnestic MCI, which carries a higher risk of conversion to AD.
Collapse
Affiliation(s)
- Luwen Cao
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
| | - Kunmei Han
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
| | - Li Lin
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
- School of Foreign StudiesEast China University of Political Science and LawShanghaiChina
| | - Jiawen Hing
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
| | - Vincent Ooi
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
| | - Nick Huang
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
| | - Junhong Yu
- Cognitive and Brain Health LaboratorySchool of Social SciencesNanyang Technological UniversitySingaporeSingapore
| | - Ted Kheng Siang Ng
- Department of Psychological MedicineYong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
- Rush Institute for Healthy Aging, Department of Internal MedicineRush University Medical CenterChicagoIllinoisUSA
| | - Lei Feng
- Department of Psychological MedicineYong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
- Healthy Longevity Translational Research Program, Yong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
- Centre for Healthy Longevity, Clinic LAlexandra HospitalSingaporeSingapore
| | - Rathi Mahendran
- Department of Psychological MedicineYong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
| | - Ee Heok Kua
- Department of Psychological MedicineYong Loo Lin School of MedicineNational University of SingaporeSingaporeSingapore
| | - Zhiming Bao
- Department of English, Linguistics and Theatre StudiesNational University of SingaporeSingaporeSingapore
- Institute of Corpus Studies and ApplicationsShanghai International Studies UniversityShanghaiChina
| |
Collapse
|
3
|
Ma W, Xu L, Zhang H, Zhang S. Can Natural Speech Prosody Distinguish Autism Spectrum Disorders? A Meta-Analysis. Behav Sci (Basel) 2024; 14:90. [PMID: 38392443 PMCID: PMC10886261 DOI: 10.3390/bs14020090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/21/2024] [Accepted: 01/24/2024] [Indexed: 02/24/2024] Open
Abstract
Natural speech plays a pivotal role in communication and interactions between human beings. The prosody of natural speech, due to its high ecological validity and sensitivity, has been acoustically analyzed and more recently utilized in machine learning to identify individuals with autism spectrum disorders (ASDs). In this meta-analysis, we evaluated the findings of empirical studies on acoustic analysis and machine learning techniques to provide statistically supporting evidence for adopting natural speech prosody for ASD detection. Using a random-effects model, the results observed moderate-to-large pooled effect sizes for pitch-related parameters in distinguishing individuals with ASD from their typically developing (TD) counterparts. Specifically, the standardized mean difference (SMD) values for pitch mean, pitch range, pitch standard deviation, and pitch variability were 0.3528, 0.6744, 0.5735, and 0.5137, respectively. However, the differences between the two groups in temporal features could be unreliable, as the SMD values for duration and speech rate were only 0.0738 and -0.0547. Moderator analysis indicated task types were unlikely to influence the final results, whereas age groups showed a moderating role in pooling pitch range differences. Furthermore, promising accuracy rates on ASD identification were shown in our analysis of multivariate machine learning studies, indicating averaged sensitivity and specificity of 75.51% and 80.31%, respectively. In conclusion, these findings shed light on the efficacy of natural prosody in identifying ASD and offer insights for future investigations in this line of research.
Collapse
Affiliation(s)
- Wen Ma
- School of Foreign Languages and Literature, Shandong University, Jinan 250100, China
| | - Lele Xu
- School of Foreign Languages and Literature, Shandong University, Jinan 250100, China
| | - Hao Zhang
- School of Foreign Languages and Literature, Shandong University, Jinan 250100, China
| | - Shurui Zhang
- School of Foreign Languages and Literature, Shandong University, Jinan 250100, China
| |
Collapse
|
4
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. EEG-based speaker-listener neural coupling reflects speech-selective attentional mechanisms beyond the speech stimulus. Cereb Cortex 2023; 33:11080-11091. [PMID: 37814353 DOI: 10.1093/cercor/bhad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/01/2023] [Accepted: 09/04/2023] [Indexed: 10/11/2023] Open
Abstract
When we pay attention to someone, do we focus only on the sound they make, the word they use, or do we form a mental space shared with the speaker we want to pay attention to? Some would argue that the human language is no other than a simple signal, but others claim that human beings understand each other because they form a shared mental ground between the speaker and the listener. Our study aimed to explore the neural mechanisms of speech-selective attention by investigating the electroencephalogram-based neural coupling between the speaker and the listener in a cocktail party paradigm. The temporal response function method was employed to reveal how the listener was coupled to the speaker at the neural level. The results showed that the neural coupling between the listener and the attended speaker peaked 5 s before speech onset at the delta band over the left frontal region, and was correlated with speech comprehension performance. In contrast, the attentional processing of speech acoustics and semantics occurred primarily at a later stage after speech onset and was not significantly correlated with comprehension performance. These findings suggest a predictive mechanism to achieve speaker-listener neural coupling for successful speech comprehension.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee, Berlin 14195, Germany
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
5
|
Verschueren E, Gillis M, Decruy L, Vanthornhout J, Francart T. Speech Understanding Oppositely Affects Acoustic and Linguistic Neural Tracking in a Speech Rate Manipulation Paradigm. J Neurosci 2022; 42:7442-7453. [PMID: 36041851 PMCID: PMC9525161 DOI: 10.1523/jneurosci.0259-22.2022] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/29/2022] [Accepted: 07/17/2022] [Indexed: 11/21/2022] Open
Abstract
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing is affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure because of the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.SIGNIFICANCE STATEMENT An increasingly popular method to investigate neural speech processing is to measure neural tracking. Although much research has been done on how the brain tracks acoustic speech features, linguistic speech features have received less attention. In this study, we disentangled acoustic and linguistic characteristics of neural speech tracking via manipulating the speech rate. A proper way of objectively measuring auditory and language processing paves the way toward clinical applications: An objective measure of speech understanding would allow for behavioral-free evaluation of speech understanding, which allows to evaluate hearing loss and adjust hearing aids based on brain responses. This objective measure would benefit populations from whom obtaining behavioral measures may be complex, such as young children or people with cognitive impairments.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Marlies Gillis
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Jonas Vanthornhout
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Tom Francart
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
6
|
Kiremitçi I, Yilmaz Ö, Çelik E, Shahdloo M, Huth AG, Çukur T. Attentional Modulation of Hierarchical Speech Representations in a Multitalker Environment. Cereb Cortex 2021; 31:4986-5005. [PMID: 34115102 PMCID: PMC8491717 DOI: 10.1093/cercor/bhab136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Humans are remarkably adept in listening to a desired speaker in a crowded environment, while filtering out nontarget speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that attention causes broad modulations at multiple levels of speech representations while growing stronger toward later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multispeaker environments.
Collapse
Affiliation(s)
- Ibrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Özgür Yilmaz
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
| | - Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Mo Shahdloo
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX3 9DU, UK
| | - Alexander G Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| |
Collapse
|
7
|
Alzrayer NM, Aldabas R, Alhossein A, Alharthi H. Naturalistic teaching approach to develop spontaneous vocalizations and augmented communication in children with autism spectrum disorder. Augment Altern Commun 2021; 37:14-24. [PMID: 33825612 DOI: 10.1080/07434618.2021.1881825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Naturalistic developmental behavioral interventions (NDBI) have been shown to facilitate the development of spontaneous language in individuals with speech and language impairment. Several meta-analyses have reported a small number of studies that utilized naturalistic teaching approaches combined with augmentative and alternative communication (AAC) interventions to develop requesting skills in individuals with autism spectrum disorder (ASD). Therefore, the main purpose of this study was to determine whether a natural language paradigm (NLP) and time delay is effective in expanding vocal and augmented requesting skills in three children with ASD between the ages of 4 and 6 years. A concurrent multiple baseline design across participants was used to evaluate the effectiveness of the intervention. The results of the study demonstrated that the participants were successful in emitting vocal requests when both modalities were available and NLP combined with time delay was effective in increasing spontaneous vocal requests in all participants.
Collapse
Affiliation(s)
- Nouf M Alzrayer
- Department of Special Education, King Saud University, Riyadh, Saudi Arabia
| | - Rashed Aldabas
- Department of Special Education, King Saud University, Riyadh, Saudi Arabia
| | | | - Hanan Alharthi
- Department of Special Education, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
8
|
Broderick MP, Anderson AJ, Lalor EC. Semantic Context Enhances the Early Auditory Encoding of Natural Speech. J Neurosci 2019; 39:7564-7575. [PMID: 31371424 PMCID: PMC6750931 DOI: 10.1523/jneurosci.0584-19.2019] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/20/2019] [Accepted: 07/29/2019] [Indexed: 01/22/2023] Open
Abstract
Speech perception involves the integration of sensory input with expectations based on the context of that speech. Much debate surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can be best explained as a purely feedforward process. Although there has been compelling evidence on both sides of this debate, experiments involving naturalistic speech stimuli to address these questions have been lacking. Here, we use a recently introduced method for quantifying the semantic context of speech and relate it to a commonly used method for indexing low-level auditory encoding of speech. The relationship between these measures is taken to be an indication of how semantic context leading up to a word influences how its low-level acoustic and phonetic features are processed. We record EEG from human participants (both male and female) listening to continuous natural speech and find that the early cortical tracking of a word's speech envelope is enhanced by its semantic similarity to its sentential context. Using a forward modeling approach, we find that prediction accuracy of the EEG signal also shows the same effect. Furthermore, this effect shows distinct temporal patterns of correlation depending on the type of speech input representation (acoustic or phonological) used for the model, implicating a top-down propagation of information through the processing hierarchy. These results suggest a mechanism that links top-down prior information with the early cortical entrainment of words in natural, continuous speech.SIGNIFICANCE STATEMENT During natural speech comprehension, we use semantic context when processing information about new incoming words. However, precisely how the neural processing of bottom-up sensory information is affected by top-down context-based predictions remains controversial. We address this discussion using a novel approach that indexes a word's similarity to context and how well a word's acoustic and phonetic features are processed by the brain at the time of its utterance. We relate these two measures and show that lower-level auditory tracking of speech improves for words that are more related to their preceding context. These results suggest a mechanism that links top-down prior information with bottom-up sensory processing in the context of natural, narrative speech listening.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland,
| | - Andrew J Anderson
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
9
|
Abstract
Introduction-Purpose: Despite the growing amount of data on the 'synthetic speech perception' by people with no disabilities, there has been limited research on the intelligibility and comprehension of synthetic speech systems by dyslexics. This study investigated the intelligibility and comprehension of synthetic versus natural speech in Greek dyslexic students.Method: Forty three dyslexic students were presented with various acoustic stimuli (words, sentences, texts) both in synthetic and natural speech.Results: The data analysis has shown that dyslexic students had identified better words and sentences presented in natural than in synthetic speech. In regards to their levels of performance in text comprehension there was not any significant difference between synthetic and natural speech. Perhaps their observed difficulties in their word/ sentence intelligibility did not manage to restrain their levels of text comprehensibility. It seemed as if the context cues provided by the text each time had assisted dyslexics in comprehending the text more effectively no matter the speech condition used (natural versus synthetic one).Conclusion: Given that the overall purpose of reading is comprehension it is suggested that the text to speech systems could be used after all by the dyslexics as support as scaffolds for the intended purpose of reading comprehension.Implications for rehabilitation This research constitutes a trial to investigate the possibility of text-to-speech technology to serve as an educational aid for students with reading deficits helping them to master reading tasks that they may not have been able to do on their own. Students with reading disabilities need to be more successful in the reading process and managing at the same time to decrease the achievement gap that exists between them and those without disabilities. This work suggests that:• Text-to-speech technology needs further development and improvement to provide a closer to natural speech output in order to be a valuable educational aid for students with dyslexia.• Although there is a deviation, especially, in intelligibility between synthetic and natural speech that kind of assistive technology could provide a useful educational aid for those students that reading tasks appear to be a cumbersome process.• Natural and synthetic speech in combination and in discriminating use (for instance, natural speech in word tasks and synthetic in contextual tasks i.e., reading texts) could be integral tools to numerous educational settings and rehabilitation bodies.
Collapse
Affiliation(s)
- Vicky Giannouli
- Department of Special Education & Lifelong Learning, Faculty of Social, Humanities and Arts, School of Social Sciences Humanities and Arts, University of Macedonia, Thessaloniki, Greece
| | - Marianna Banou
- Department of Special Education & Lifelong Learning, Faculty of Social, Humanities and Arts, School of Social Sciences Humanities and Arts, University of Macedonia, Thessaloniki, Greece
| |
Collapse
|
10
|
Peng F, Innes-Brown H, McKay CM, Fallon JB, Zhou Y, Wang X, Hu N, Hou W. Temporal Coding of Voice Pitch Contours in Mandarin Tones. Front Neural Circuits 2018; 12:55. [PMID: 30087597 PMCID: PMC6066958 DOI: 10.3389/fncir.2018.00055] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 06/27/2018] [Indexed: 11/13/2022] Open
Abstract
Accurate perception of time-variant pitch is important for speech recognition, particularly for tonal languages with different lexical tones such as Mandarin, in which different tones convey different semantic information. Previous studies reported that the auditory nerve and cochlear nucleus can encode different pitches through phase-locked neural activities. However, little is known about how the inferior colliculus (IC) encodes the time-variant periodicity pitch of natural speech. In this study, the Mandarin syllable /ba/ pronounced with four lexical tones (flat, rising, falling then rising and falling) were used as stimuli. Local field potentials (LFPs) and single neuron activity were simultaneously recorded from 90 sites within contralateral IC of six urethane-anesthetized and decerebrate guinea pigs in response to the four stimuli. Analysis of the temporal information of LFPs showed that 93% of the LFPs exhibited robust encoding of periodicity pitch. Pitch strength of LFPs derived from the autocorrelogram was significantly (p < 0.001) stronger for rising tones than flat and falling tones. Pitch strength are also significantly increased (p < 0.05) with the characteristic frequency (CF). On the other hand, only 47% (42 or 90) of single neuron activities were significantly synchronized to the fundamental frequency of the stimulus suggesting that the temporal spiking pattern of single IC neuron could encode the time variant periodicity pitch of speech robustly. The difference between the number of LFPs and single neurons that encode the time-variant F0 voice pitch supports the notion of a transition at the level of IC from direct temporal coding in the spike trains of individual neurons to other form of neural representation.
Collapse
Affiliation(s)
- Fei Peng
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Hamish Innes-Brown
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
| | - Colette M. McKay
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
| | - James B. Fallon
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
- Department of Otolaryngology, University of Melbourne, Melbourne, VIC, Australia
| | - Yi Zhou
- Chongqing Key Laboratory of Neurobiology, Department of Neurobiology, Third Military Medical University, Chongqing, China
| | - Xing Wang
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Chongqing Medical Electronics Engineering Technology Research Center, Chongqing University, Chongqing, China
| | - Ning Hu
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Wensheng Hou
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
- Chongqing Medical Electronics Engineering Technology Research Center, Chongqing University, Chongqing, China
| |
Collapse
|
11
|
Schmidt J, Janse E, Scharenborg O. Perception of Emotion in Conversational Speech by Younger and Older Listeners. Front Psychol 2016; 7:781. [PMID: 27303340 PMCID: PMC4885861 DOI: 10.3389/fpsyg.2016.00781] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 05/09/2016] [Indexed: 11/18/2022] Open
Abstract
This study investigated whether age and/or differences in hearing sensitivity influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech. To that end, this study specifically focused on the relationship between participants' ratings of short affective utterances and the utterances' acoustic parameters (pitch, intensity, and articulation rate) known to be associated with the emotion dimensions arousal and valence. Stimuli consisted of short utterances taken from a corpus of conversational speech. In two rating tasks, younger and older adults either rated arousal or valence using a 5-point scale. Mean intensity was found to be the main cue participants used in the arousal task (i.e., higher mean intensity cueing higher levels of arousal) while mean F 0 was the main cue in the valence task (i.e., higher mean F 0 being interpreted as more negative). Even though there were no overall age group differences in arousal or valence ratings, compared to younger adults, older adults responded less strongly to mean intensity differences cueing arousal and responded more strongly to differences in mean F 0 cueing valence. Individual hearing sensitivity among the older adults did not modify the use of mean intensity as an arousal cue. However, individual hearing sensitivity generally affected valence ratings and modified the use of mean F 0. We conclude that age differences in the interpretation of mean F 0 as a cue for valence are likely due to age-related hearing loss, whereas age differences in rating arousal do not seem to be driven by hearing sensitivity differences between age groups (as measured by pure-tone audiometry).
Collapse
Affiliation(s)
- Juliane Schmidt
- Centre for Language Studies, Radboud UniversityNijmegen, Netherlands
- International Max Planck Research School for Language SciencesNijmegen, Netherlands
| | - Esther Janse
- Centre for Language Studies, Radboud UniversityNijmegen, Netherlands
- Donders Institute for Brain, Cognition and BehaviourNijmegen, Netherlands
| | - Odette Scharenborg
- Centre for Language Studies, Radboud UniversityNijmegen, Netherlands
- Donders Institute for Brain, Cognition and BehaviourNijmegen, Netherlands
| |
Collapse
|
12
|
Abstract
It is well-established that listeners will shift their categorization of a target vowel as a function of acoustic characteristics of a preceding carrier phrase (CP). These results have been interpreted as an example of perceptual normalization for variability resulting from differences in talker anatomy. The present study examined whether listeners would normalize for acoustic variability resulting from differences in speaking style within a single talker. Two vowel series were synthesized that varied between central and peripheral vowels (the vowels in "beat"-"bit" and "bod"-"bud"). Each member of the series was appended to one of four CPs that were spoken in either a "clear" or "reduced" speech style. Participants categorized vowels in these eight contexts. A reliable shift in categorization as a function of speaking style was obtained for three of four phrase sets. This demonstrates that phrase context effects can be obtained with a single talker. However, the directions of the obtained shifts are not reliably predicted on the basis of the speaking style of the talker. Instead, it appears that the effect is determined by an interaction of the average spectrum of the phrase with the target vowel.
Collapse
Affiliation(s)
- A. Davi Vitela
- Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| | - Natasha Warner
- Department of Linguistics, University of ArizonaTucson, AZ, USA
| | - Andrew J. Lotto
- Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
13
|
Bosch L, Figueras M, Teixidó M, Ramon-Casas M. Rapid gains in segmenting fluent speech when words match the rhythmic unit: evidence from infants acquiring syllable-timed languages. Front Psychol 2013; 4:106. [PMID: 23467921 PMCID: PMC3587802 DOI: 10.3389/fpsyg.2013.00106] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 02/14/2013] [Indexed: 11/22/2022] Open
Abstract
The ability to extract word-forms from sentential contexts represents an initial step in infants' process toward lexical acquisition. By age 6 months the ability is just emerging and evidence of it is restricted to certain testing conditions. Most research has been developed with infants acquiring stress-timed languages (English, but also German and Dutch) whose rhythmic unit is not the syllable. Data from infants acquiring syllable-timed languages are still scarce and limited to French (European and Canadian), partially revealing some discrepancies with English regarding the age at which word segmentation ability emerges. Research reported here aims at broadening this cross-linguistic perspective by presenting first data on the early ability to segment monosyllabic word-forms by infants acquiring Spanish and Catalan. Three different language groups (two monolingual and one bilingual) and two different age groups (8- and 6-month-old infants) were tested using natural language and a modified version of the HPP with familiarization to passages and testing on words. Results revealed positive evidence of word segmentation in all groups at both ages, but critically, the pattern of preference differed by age. A novelty preference was obtained in the older groups, while the expected familiarity preference was only found at the younger age tested, suggesting more advanced segmentation ability with an increase in age. These results offer first evidence of an early ability for monosyllabic word segmentation in infants acquiring syllable-timed languages such as Spanish or Catalan, not previously described in the literature. Data show no impact of bilingual exposure in the emergence of this ability and results suggest rapid gains in early segmentation for words that match the rhythm unit of the native language.
Collapse
Affiliation(s)
- Laura Bosch
- Department of Basic Psychology, University of BarcelonaBarcelona, Spain
- Institute for Research in Brain, Behavior and Cognition (IR3C), University of BarcelonaBarcelona, Spain
| | - Melània Figueras
- Department of Basic Psychology, University of BarcelonaBarcelona, Spain
| | - Maria Teixidó
- Department of Basic Psychology, University of BarcelonaBarcelona, Spain
| | - Marta Ramon-Casas
- Department of Basic Psychology, University of BarcelonaBarcelona, Spain
| |
Collapse
|