1
|
Thompson A, Kim Y. Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:3595-3611. [PMID: 39259883 PMCID: PMC11482579 DOI: 10.1044/2024_jslhr-24-00153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/07/2024] [Accepted: 07/02/2024] [Indexed: 09/13/2024]
Abstract
PURPOSE This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings. METHOD Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models. RESULTS Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision. CONCLUSIONS The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD. OPEN SCIENCE FORM https://doi.org/10.23641/asha.27011281.
Collapse
Affiliation(s)
- Austin Thompson
- Department of Communication Sciences and Disorders, University of Houston, TX
| | - Yunjung Kim
- School of Communication Science and Disorders, Florida State University, Tallahassee, FL
| |
Collapse
|
2
|
McGonigle E, VanDam M, Wilkinson C, Johnson KT. Benchmarking Automatic Speech Recognition Technology for Natural Language Samples of Children With and Without Developmental Delays. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40039537 DOI: 10.1109/embc53108.2024.10782773] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Natural language sampling (NLS) offers rich insights into real-world speech and language usage across diverse groups; yet, human transcription is time-consuming and costly. Automatic speech recognition (ASR) technology has the potential to revolutionize NLS research. However, its performance in clinical-research settings with young children and those with developmental delays remains unknown. This study evaluates the OpenAI Whisper ASR model on n=34 NLS sessions of toddlers with and without language delays. Manual comparison of ASR to human transcriptions of children with Down Syndrome (DS; n=19; 2-5 years old) and typically-developing children (TD; n=15; 2-3 years old) revealed ASR accurately captured 50% of words spoken by TD children but only 14% for those with DS. About 20% of words were missed in both groups, and 21% (TD) and 6% (DS) of words were replaced. ASR also struggled with developmentally informative sounds, such as non-speech vocalizations, missing almost 50% in the DS data and misinterpreting most of the rest. While ASR shows potential in reducing transcription time, its limitations underscore the need for human-in-the-loop clinical machine learning systems, especially for underrepresented groups.
Collapse
|
3
|
Tabari F, Berger JI, Flouty O, Copeland B, Greenlee JD, Johari K. Speech, voice, and language outcomes following deep brain stimulation: A systematic review. PLoS One 2024; 19:e0302739. [PMID: 38728329 PMCID: PMC11086900 DOI: 10.1371/journal.pone.0302739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 04/09/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Deep brain stimulation (DBS) reliably ameliorates cardinal motor symptoms in Parkinson's disease (PD) and essential tremor (ET). However, the effects of DBS on speech, voice and language have been inconsistent and have not been examined comprehensively in a single study. OBJECTIVE We conducted a systematic analysis of literature by reviewing studies that examined the effects of DBS on speech, voice and language in PD and ET. METHODS A total of 675 publications were retrieved from PubMed, Embase, CINHAL, Web of Science, Cochrane Library and Scopus databases. Based on our selection criteria, 90 papers were included in our analysis. The selected publications were categorized into four subcategories: Fluency, Word production, Articulation and phonology and Voice quality. RESULTS The results suggested a long-term decline in verbal fluency, with more studies reporting deficits in phonemic fluency than semantic fluency following DBS. Additionally, high frequency stimulation, left-sided and bilateral DBS were associated with worse verbal fluency outcomes. Naming improved in the short-term following DBS-ON compared to DBS-OFF, with no long-term differences between the two conditions. Bilateral and low-frequency DBS demonstrated a relative improvement for phonation and articulation. Nonetheless, long-term DBS exacerbated phonation and articulation deficits. The effect of DBS on voice was highly variable, with both improvements and deterioration in different measures of voice. CONCLUSION This was the first study that aimed to combine the outcome of speech, voice, and language following DBS in a single systematic review. The findings revealed a heterogeneous pattern of results for speech, voice, and language across DBS studies, and provided directions for future studies.
Collapse
Affiliation(s)
- Fatemeh Tabari
- Human Neurophysiology and Neuromodulation Laboratory, Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA, United States of America
| | - Joel I. Berger
- Human Brain Research Laboratory, Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, IA, United States of America
| | - Oliver Flouty
- Department of Neurosurgery and Brain Repair, University of South Florida, Tampa, FL, United States of America
| | - Brian Copeland
- Department of Neurology, LSU Health Sciences Center, New Orleans, LA, United States of America
| | - Jeremy D. Greenlee
- Human Brain Research Laboratory, Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, IA, United States of America
- Iowa Neuroscience Institute, Iowa City, IA, United States of America
| | - Karim Johari
- Human Neurophysiology and Neuromodulation Laboratory, Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA, United States of America
| |
Collapse
|
4
|
Nijhawan R, Kumar M, Arya S, Mendirtta N, Kumar S, Towfek SK, Khafaga DS, Alkahtani HK, Abdelhamid AA. A Novel Artificial-Intelligence-Based Approach for Classification of Parkinson's Disease Using Complex and Large Vocal Features. Biomimetics (Basel) 2023; 8:351. [PMID: 37622956 PMCID: PMC10452203 DOI: 10.3390/biomimetics8040351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/28/2023] [Accepted: 07/31/2023] [Indexed: 08/26/2023] Open
Abstract
Parkinson's disease (PD) affects a large proportion of elderly people. Symptoms include tremors, slow movement, rigid muscles, and trouble speaking. With the aging of the developed world's population, this number is expected to rise. The early detection of PD and avoiding its severe consequences require a precise and efficient system. Our goal is to create an accurate AI model that can identify PD using human voices. We developed a transformer-based method for detecting PD by retrieving dysphonia measures from a subject's voice recording. It is uncommon to use a neural network (NN)-based solution for tabular vocal characteristics, but it has several advantages over a tree-based approach, including compatibility with continuous learning and the network's potential to be linked with an image/voice encoder for a more accurate multi modal solution, shifting SOTA approach from tree-based to a neural network (NN) is crucial for advancing research in multimodal solutions. Our method outperforms the state of the art (SOTA), namely Gradient-Boosted Decision Trees (GBDTs), by at least 1% AUC, and the precision and recall scores are also improved. We additionally offered an XgBoost-based feature-selection method and a fully connected NN layer technique for including continuous dysphonia measures, in addition to the solution network. We also discussed numerous important discoveries relating to our suggested solution and deep learning (DL) and its application to dysphonia measures, such as how a transformer-based network is more resilient to increased depth compared to a simple MLP network. The performance of the proposed approach and conventional machine learning techniques such as MLP, SVM, and Random Forest (RF) have also been compared. A detailed performance comparison matrix has been added to this article, along with the proposed solution's space and time complexity.
Collapse
Affiliation(s)
- Rahul Nijhawan
- Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala 147004, India
| | - Mukul Kumar
- Blackstraw Technologies Pvt Ltd., Chennai 160015, India
| | | | - Neha Mendirtta
- Computer Science and Engineering, Chandigarh University, Ajitgarh 140413, India
| | - Sunil Kumar
- Department of Computer Science and Artificial Intelligence, SR University, Warangal 506371, India
- Department of Computer Science, Graphic Era Hill University, Dehradun 248001, India
| | - S. K. Towfek
- Computer Science and Intelligent Systems Research Center, Blacksburg, VA 24060, USA
- Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura 35111, Egypt
| | - Doaa Sami Khafaga
- Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Hend K. Alkahtani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Abdelaziz A. Abdelhamid
- Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra 11961, Saudi Arabia;
- Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt
| |
Collapse
|
5
|
Idrisoglu A, Dallora AL, Anderberg P, Berglund JS. Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review. J Med Internet Res 2023; 25:e46105. [PMID: 37467031 PMCID: PMC10398366 DOI: 10.2196/46105] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/26/2023] [Accepted: 05/23/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. OBJECTIVE This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. METHODS This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. RESULTS In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. CONCLUSIONS This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.
Collapse
Affiliation(s)
- Alper Idrisoglu
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Ana Luiza Dallora
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
- School of Health Sciences, University of Skövde, Skövde, Sweden
| | | |
Collapse
|
6
|
Wiesman AI, Donhauser PW, Degroot C, Diab S, Kousaie S, Fon EA, Klein D, Baillet S. Aberrant neurophysiological signaling associated with speech impairments in Parkinson's disease. NPJ Parkinsons Dis 2023; 9:61. [PMID: 37059749 PMCID: PMC10104849 DOI: 10.1038/s41531-023-00495-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/16/2023] [Indexed: 04/16/2023] Open
Abstract
Difficulty producing intelligible speech is a debilitating symptom of Parkinson's disease (PD). Yet, both the robust evaluation of speech impairments and the identification of the affected brain systems are challenging. Using task-free magnetoencephalography, we examine the spectral and spatial definitions of the functional neuropathology underlying reduced speech quality in patients with PD using a new approach to characterize speech impairments and a novel brain-imaging marker. We found that the interactive scoring of speech impairments in PD (N = 59) is reliable across non-expert raters, and better related to the hallmark motor and cognitive impairments of PD than automatically-extracted acoustical features. By relating these speech impairment ratings to neurophysiological deviations from healthy adults (N = 65), we show that articulation impairments in patients with PD are associated with aberrant activity in the left inferior frontal cortex, and that functional connectivity of this region with somatomotor cortices mediates the influence of cognitive decline on speech deficits.
Collapse
Affiliation(s)
- Alex I Wiesman
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada
| | - Peter W Donhauser
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| | - Clotilde Degroot
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada
| | - Sabrina Diab
- Department of Psychology, Université du Québec à Montréal, Montréal, QC, Canada
| | - Shanna Kousaie
- School of Psychology, University of Ottawa, Ottawa, ON, Canada
| | - Edward A Fon
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada
| | - Denise Klein
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada.
- Center for Research on Brain, Language and Music, McGill University, Montreal, QC, Canada.
| | - Sylvain Baillet
- Montreal Neurological Institute, McGill University, 3801 Rue University, Montreal, QC, Canada.
| |
Collapse
|
7
|
Hecker P, Steckhan N, Eyben F, Schuller BW, Arnrich B. Voice Analysis for Neurological Disorder Recognition–A Systematic Review and Perspective on Emerging Trends. Front Digit Health 2022; 4:842301. [PMID: 35899034 PMCID: PMC9309252 DOI: 10.3389/fdgth.2022.842301] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 05/25/2022] [Indexed: 11/25/2022] Open
Abstract
Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
Collapse
Affiliation(s)
- Pascal Hecker
- Digital Health – Connected Healthcare, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
- audEERING GmbH, Gilching, Germany
- *Correspondence: Pascal Hecker ; orcid.org/0000-0001-6604-1671
| | - Nico Steckhan
- Digital Health – Connected Healthcare, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| | | | - Björn W. Schuller
- audEERING GmbH, Gilching, Germany
- EIHW – Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
- GLAM – Group on Language, Audio, & Music, Imperial College London, London, United Kingdom
| | - Bert Arnrich
- Digital Health – Connected Healthcare, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| |
Collapse
|
8
|
Ghane M, Ang MC, Nilashi M, Sorooshian S. Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Karan B, Sahu SS, Orozco-Arroyave JR. An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
10
|
|
11
|
A Simple and Effective Approach Based on a Multi-Level Feature Selection for Automated Parkinson's Disease Detection. J Pers Med 2022; 12:jpm12010055. [PMID: 35055370 PMCID: PMC8781034 DOI: 10.3390/jpm12010055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 12/27/2021] [Accepted: 12/30/2021] [Indexed: 12/07/2022] Open
Abstract
Parkinson’s disease (PD), which is a slowly progressing neurodegenerative disorder, negatively affects people’s daily lives. Early diagnosis is of great importance to minimize the effects of PD. One of the most important symptoms in the early diagnosis of PD disease is the monotony and distortion of speech. Artificial intelligence-based approaches can help specialists and physicians to automatically detect these disorders. In this study, a new and powerful approach based on multi-level feature selection was proposed to detect PD from features containing voice recordings of already-diagnosed cases. At the first level, feature selection was performed with the Chi-square and L1-Norm SVM algorithms (CLS). Then, the features that were extracted from these algorithms were combined to increase the representation power of the samples. At the last level, those samples that were highly distinctive from the combined feature set were selected with feature importance weights using the ReliefF algorithm. In the classification stage, popular classifiers such as KNN, SVM, and DT were used for machine learning, and the best performance was achieved with the KNN classifier. Moreover, the hyperparameters of the KNN classifier were selected with the Bayesian optimization algorithm, and the performance of the proposed approach was further improved. The proposed approach was evaluated using a 10-fold cross-validation technique on a dataset containing PD and normal classes, and a classification accuracy of 95.4% was achieved.
Collapse
|
12
|
Woisard V, Balaguer M, Fredouille C, Farinas J, Ghio A, Lalain M, Puech M, Astesano C, Pinquier J, Lepage B. Construction of an automatic score for the evaluation of speech disorders among patients treated for a cancer of the oral cavity or the oropharynx: The Carcinologic Speech Severity Index. Head Neck 2021; 44:71-88. [PMID: 34729847 DOI: 10.1002/hed.26903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 08/15/2021] [Accepted: 10/05/2021] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Speech disorders impact quality of life for patients treated with oral cavity and oropharynx cancers. However, there is a lack of uniform and applicable methods for measuring the impact on speech production after treatment in this tumor location. OBJECTIVE The objective of this work is to (1) model an automatic severity index of speech applicable in clinical practice, that is equivalent or superior to a severity score obtained by human listeners, via several acoustics parameters extracted (a) directly from speech signal and (b) resulting from speech processing and (2) derive an automatic speech intelligibility classification (i.e., mild, moderate, severe) to predict speech disability and handicap by combining the listener comprehension score with self-reported quality of life related to speech. METHODS Eighty-seven patients treated for cancer of the oral cavity or the oropharynx and 35 controls performed different tasks of speech production and completed questionnaires on speech-related quality of life. The audio recordings were then evaluated by human perception and automatic speech processing. Then, a score was developed through a classic logistic regression model allowing description of the severity of patients' speech disorders. RESULTS Among the group of parameters subject to extraction from automatic processing of the speech signal, six were retained, producing a correlation at 0.87 with the perceptual reference score, 0.77 with the comprehension score, and 0.5 with speech-related quality of life. The parameters that contributed the most are based on automatic speech recognition systems. These are mainly the automatic average normalized likelihood score on a text reading task and the score of cumulative rankings on pseudowords. The reduced automatic YC2SI is modeled in this way: YC2SIp = 11.48726 + (1.52926 × Xaveraged normalized likelihood reading ) + (-1.94e-06 × Xscore of cumulative ranks pseudowords ). CONCLUSION Automatic processing of speech makes it possible to arrive at valid, reliable, and reproducible parameters able to serve as references in the framework of follow-up of patients treated for cancer of the oral cavity or the oropharynx.
Collapse
Affiliation(s)
- Virginie Woisard
- ENT Department, University Hospital of Toulouse, Toulouse, France.,Oncorehabilation Unit, University Institute of Cancer of Toulouse Oncopole, Toulouse, France.,Laboratoire Octogone-Lordat, Jean Jaures University Toulouse II, Toulouse, France
| | - Mathieu Balaguer
- ENT Department, University Hospital of Toulouse, Toulouse, France.,Institut de Recherche en Informatique de Toulouse, CNRS, Paul Sabatier University Toulouse III, Toulouse, France
| | - Corinne Fredouille
- Laboratoire d'Informatique d'Avignon, Avignon University, Avignon, France
| | - Jérôme Farinas
- Oncorehabilation Unit, University Institute of Cancer of Toulouse Oncopole, Toulouse, France
| | - Alain Ghio
- Laboratoire Parole et Langage, Aix-Marseille University, Marseille, France
| | - Muriel Lalain
- Laboratoire Parole et Langage, Aix-Marseille University, Marseille, France
| | - Michèle Puech
- ENT Department, University Hospital of Toulouse, Toulouse, France.,Oncorehabilation Unit, University Institute of Cancer of Toulouse Oncopole, Toulouse, France
| | - Corine Astesano
- Laboratoire Octogone-Lordat, Jean Jaures University Toulouse II, Toulouse, France
| | - Julien Pinquier
- Oncorehabilation Unit, University Institute of Cancer of Toulouse Oncopole, Toulouse, France
| | - Benoît Lepage
- ENT Department, University Hospital of Toulouse, Toulouse, France.,USMR, Université Paul Sabatier Toulouse III, Toulouse, France
| |
Collapse
|
13
|
On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Laganas C, Iakovakis D, Hadjidimitriou S, Charisis V, Dias SB, Bostantzopoulou S, Katsarou Z, Klingelhoefer L, Reichmann H, Trivedi D, Chaudhuri KR, Hadjileontiadis LJ. Parkinson's Disease Detection Based on Running Speech Data From Phone Calls. IEEE Trans Biomed Eng 2021; 69:1573-1584. [PMID: 34596531 DOI: 10.1109/tbme.2021.3116935] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Parkinson's Disease (PD) is a progressive neurodegenerative disorder, manifesting with subtle early signs, which often hinder timely and early diagnosis and treatment. The development of accessible, technology-based methods for longitudinal PD symptoms tracking in daily living offers the potential for transforming the disease assessment and accelerating PD diagnosis. METHODS A privacy-aware method for classifying PD patients and healthy controls (HC), on the grounds of speech impairment present in PD, is proposed here. Voice features from running speech signals were extracted from recordings passively captured over voice phone calls. Features are fed in a language-aware training of multiple- and single-instance learning classifiers, along with demographic variables, exploiting a multilingual cohort of 498 subjects (392/106 self-reported HC/PD patients) to classify PD. RESULTS By means of leave-one-subject-out cross-validation, the best-performing models yielded 0.69/0.68/0.63/0.83 area under the Receiver Operating Characteristic curve (AUC) for the binary classification of PD patient vs. HC in sub-cohorts of English/Greek/German/Portuguese-speaking subjects, respectively. Out-of-sample testing of the best performing models was conducted in an additional dataset, generated by 63 clinically-assessed subjects (24/39 HC/early PD patients). Testing has resulted in 0.84/0.93/0.83 AUC for the English/Greek/German-speaking sub-cohorts, respectively. Comparative analysis with other approaches for language-aware PD detection justified the efficiency of the proposed one, considering the ecological validity of the acquired voice data. CONCLUSIONS The present work demonstrates increased robustness in PD detection using voice data captured in-the-wild. SIGNIFICANCE A high-frequency, privacy-aware and unobtrusive PD screening tool is introduced for the first time, based on analysis of voice samples captured during routine phone calls.
Collapse
|
15
|
An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification. Symmetry (Basel) 2021. [DOI: 10.3390/sym13091728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.
Collapse
|
16
|
Ma J, Zhang Y, Li Y, Zhou L, Qin L, Zeng Y, Wang P, Lei Y. Deep dual-side learning ensemble model for Parkinson speech recognition. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102849] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
17
|
An improved framework for Parkinson’s disease prediction using Variational Mode Decomposition-Hilbert spectrum of speech signal. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.04.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Slis A, Lévêque N, Fougeron C, Pernon M, Assal F, Lancia L. Analysing spectral changes over time to identify articulatory impairments in dysarthria. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:758. [PMID: 33639779 DOI: 10.1121/10.0003332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Accepted: 12/17/2020] [Indexed: 06/12/2023]
Abstract
Identifying characteristics of articulatory impairment in speech motor disorders is complicated due to the time-consuming nature of kinematic measures. The goal is to explore whether analysing the acoustic signal in terms of total squared changes of Mel-Frequency Cepstral Coefficients (TSC_MFCC) and its pattern over time provides sufficient spectral information to distinguish mild and moderate dysarthric French speakers with Amyotrophic Lateral Sclerosis (ALS) and Parkinson's Disease (PD) from each other and from healthy speakers. Participants produced the vowel-glide sequences /ajajaj/, /ujujuj/, and /wiwiwi/. From the time course of TSC_MFCCs, event-related and global measures were extracted to capture the degree of acoustic change and its variability. In addition, durational measures were obtained. For both mild and moderately impaired PD and ALS speakers, the degree of acoustic change and its variability, averaged over the complete contour, separated PD and ALS speakers from each other and from healthy speakers, especially when producing the sequences /ujujuj/ and /wiwiwi/. Durational measures separated the moderate ALS from healthy and moderate PD speakers. Using the approach on repetitive sequences targeting the lingual and labial articulators to characterize articulatory impairment in speech motor disorders is promising. Findings are discussed against prior findings of articulatory impairment in the populations studied.
Collapse
Affiliation(s)
- A Slis
- LPP, UMR 7018, CNRS/University Sorbonne-Nouvelle, Paris, France
| | - N Lévêque
- APHP, Department of Neurology, Pitié-Salpêtrière Hospital, ALS Reference Center, France
| | - C Fougeron
- LPP, UMR 7018, CNRS/University Sorbonne-Nouvelle, Paris, France
| | - M Pernon
- Department of Clinical Neurosciences, Geneva University Hospital, Switzerland
| | - F Assal
- Department of Clinical Neurosciences, Geneva University Hospital, Switzerland
| | - L Lancia
- LPP, UMR 7018, CNRS/University Sorbonne-Nouvelle, Paris, France
| |
Collapse
|
19
|
Local discriminant preservation projection embedded ensemble learning based dimensionality reduction of speech data of Parkinson’s disease. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102165] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
20
|
Jamin A, Abraham P, Humeau-Heurtier A. Machine learning for predictive data analytics in medicine: A review illustrated by cardiovascular and nuclear medicine examples. Clin Physiol Funct Imaging 2020; 41:113-127. [PMID: 33316137 DOI: 10.1111/cpf.12686] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 11/01/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022]
Abstract
The evidence-based medicine allows the physician to evaluate the risk-benefit ratio of a treatment through setting and data. Risk-based choices can be done by the doctor using different information. With the emergence of new technologies, a large amount of data is recorded offering interesting perspectives with machine learning for predictive data analytics. Machine learning is an ensemble of methods that process data to model a learning problem. Supervised machine learning algorithms consist in using annotated data to construct the model. This category allows to solve prediction data analytics problems. In this paper, we detail the use of supervised machine learning algorithms for predictive data analytics problems in medicine. In the medical field, data can be split into two categories: medical images and other data. For brevity, our review deals with any kind of medical data excluding images. In this article, we offer a discussion around four supervised machine learning approaches: information-based, similarity-based, probability-based and error-based approaches. Each method is illustrated with detailed cardiovascular and nuclear medicine examples. Our review shows that model ensemble (ME) and support vector machine (SVM) methods are the most popular. SVM, ME and artificial neural networks often lead to better results than those given by other algorithms. In the coming years, more studies, more data, more tools and more methods will, for sure, be proposed.
Collapse
Affiliation(s)
- Antoine Jamin
- COTTOS Médical, Avrillé, France.,LERIA-Laboratoire d'Etude et de Recherche en Informatique d'Angers, Univ. Angers, Angers, France.,LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| | - Pierre Abraham
- Sports Medicine Department, UMR Mitovasc CNRS 6015 INSERM 1228, Angers University Hospital, Angers, France
| | - Anne Humeau-Heurtier
- LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| |
Collapse
|
21
|
Saggio G, Costantini G. Worldwide Healthy Adult Voice Baseline Parameters: A Comprehensive Review. J Voice 2020; 36:637-649. [PMID: 33039203 DOI: 10.1016/j.jvoice.2020.08.028] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/20/2020] [Accepted: 08/21/2020] [Indexed: 12/17/2022]
Abstract
The voice results in acoustic signals analyzed and synthetized at first for telecommunication matters, and more recently investigated for medical purposes. In particular, voice signal characteristics can evidence individual health conditions useful for screening, diagnostic and remote monitoring aims. Within this frame, the knowledge of baseline features of healthy voice is mandatory, in order to balance a comparison with their unhealthy counterpart. However, the baseline features of the human voice depend on gender, age-range and ethnicity and, as far as we know, no work reports as those features spread worldwide. This paper intends to cover this lack. Our database research yielded 179 relevant published studies, retrieved using digital libraries of IEEE Xplore, Scopus, Web of Science, Iop Science, Taylor and Francis Online, and Scitepress. These relevant studies report different features, among which here we consider the most investigated ones, within the most investigated age-range. In particular, the features are the fundamental frequency, the jitter, the shimmer, the harmonic-to-noise ratio, and the cepstral peak prominence, the most investigated age-range is within 20-40 years and, related to the ethnicity, 20 countries are considered.
Collapse
Affiliation(s)
- Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
22
|
Khan T, Lundgren LE, Anderson DG, Nowak I, Dougherty M, Verikas A, Pavel M, Jimison H, Nowaczyk S, Aharonson V. Assessing Parkinson's disease severity using speech analysis in non-native speakers. COMPUT SPEECH LANG 2020. [DOI: 10.1016/j.csl.2019.101047] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
Al-Hameed S, Benaissa M, Christensen H, Mirheidari B, Blackburn D, Reuber M. A new diagnostic approach for the identification of patients with neurodegenerative cognitive complaints. PLoS One 2019; 14:e0217388. [PMID: 31125389 PMCID: PMC6534304 DOI: 10.1371/journal.pone.0217388] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 05/11/2019] [Indexed: 11/18/2022] Open
Abstract
Neurodegenerative diseases causing dementia are known to affect a person's speech and language. Part of the expert assessment in memory clinics therefore routinely focuses on detecting such features. The current outpatient procedures examining patients' verbal and interactional abilities mainly focus on verbal recall, word fluency, and comprehension. By capturing neurodegeneration-associated characteristics in a person's voice, the incorporation of novel methods based on the automatic analysis of speech signals may give us more information about a person's ability to interact which could contribute to the diagnostic process. In this proof-of-principle study, we demonstrate that purely acoustic features, extracted from recordings of patients' answers to a neurologist's questions in a specialist memory clinic can support the initial distinction between patients presenting with cognitive concerns attributable to progressive neurodegenerative disorders (ND) or Functional Memory Disorder (FMD, i.e., subjective memory concerns unassociated with objective cognitive deficits or a risk of progression). The study involved 15 FMD and 15 ND patients where a total of 51 acoustic features were extracted from the recordings. Feature selection was used to identify the most discriminating features which were then used to train five different machine learning classifiers to differentiate between the FMD/ND classes, achieving a mean classification accuracy of 96.2%. The discriminative power of purely acoustic approaches could be integrated into diagnostic pathways for patients presenting with memory concerns and are computationally less demanding than methods focusing on linguistic elements of speech and language that require automatic speech recognition and understanding.
Collapse
Affiliation(s)
- Sabah Al-Hameed
- Dept of Electronic and Electrical Engineering, University of Sheffield, Sheffield, United Kingdom
| | - Mohammed Benaissa
- Dept of Electronic and Electrical Engineering, University of Sheffield, Sheffield, United Kingdom
| | - Heidi Christensen
- Dept of Computer Science, University of Sheffield, Sheffield, United Kingdom
- Centre for Assistive Technology and Connected Healthcare, University of Sheffield, Sheffield, United Kingdom
| | - Bahman Mirheidari
- Dept of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | - Daniel Blackburn
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, United Kingdom
| | - Markus Reuber
- Academic Neurology Unit, University of Sheffield, Royal Hallamshire Hospital, Sheffield, United Kingdom
| |
Collapse
|
24
|
Empirical Wavelet Transform Based Features for Classification of Parkinson’s Disease Severity. J Med Syst 2017; 42:29. [DOI: 10.1007/s10916-017-0877-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 12/13/2017] [Indexed: 10/18/2022]
|
25
|
Rosdi F, Salim SS, Mustafa MB. An FPN-based classification method for speech intelligibility detection of children with speech impairments. Soft comput 2017. [DOI: 10.1007/s00500-017-2932-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
Fletcher AR, Wisler AA, McAuliffe MJ, Lansford KL, Liss JM. Predicting Intelligibility Gains in Dysarthria Through Automated Speech Feature Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:3058-3068. [PMID: 29075755 PMCID: PMC6195072 DOI: 10.1044/2017_jslhr-s-16-0453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Revised: 06/27/2017] [Accepted: 06/27/2017] [Indexed: 05/30/2023]
Abstract
PURPOSE Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, & Liss, 2017). This study reexamines these features and assesses whether automated acoustic assessments can also be used to predict intelligibility gains. METHOD Fifty speakers (7 older individuals and 43 with dysarthria) read a passage in habitual, loud, and slow speaking modes. Automated measurements of long-term average spectra, envelope modulation spectra, and Mel-frequency cepstral coefficients were extracted from short segments of participants' baseline speech. Intelligibility gains were statistically modeled, and the predictive power of the baseline speech measures was assessed using cross-validation. RESULTS Statistical models could predict the intelligibility gains of speakers they had not been trained on. The automated acoustic features were better able to predict speakers' improvement in the loud condition than the manual measures reported in the companion article. CONCLUSIONS These acoustic analyses present a promising tool for rapidly assessing treatment options. Automated measures of baseline speech patterns may enable more selective inclusion criteria and stronger group outcomes within treatment studies.
Collapse
Affiliation(s)
- Annalise R Fletcher
- Department of Communication Disorders, University of Canterbury, Christchurch, New Zealand
| | - Alan A Wisler
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe
| | - Megan J McAuliffe
- Department of Communication Disorders, University of Canterbury, Christchurch, New Zealand
| | - Kaitlin L Lansford
- School of Communication Science & Disorders, Florida State University, Tallahassee
| | - Julie M Liss
- Department of Speech and Hearing Science, Arizona State University, Tempe
| |
Collapse
|
27
|
Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson's disease. PLoS One 2017; 12:e0182428. [PMID: 28792979 PMCID: PMC5549905 DOI: 10.1371/journal.pone.0182428] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 07/18/2017] [Indexed: 11/19/2022] Open
Abstract
The recently proposed Parkinson's Disease (PD) telediagnosis systems based on detecting dysphonia achieve very high classification rates in discriminating healthy subjects from PD patients. However, in these studies the data used to construct the classification model contain the speech recordings of both early and late PD patients with different severities of speech impairments resulting in unrealistic results. In a more realistic scenario, an early telediagnosis system is expected to be used in suspicious cases by healthy subjects or early PD patients with mild speech impairment. In this paper, considering the critical importance of early diagnosis in the treatment of the disease, we evaluate the ability of vocal features in early telediagnosis of Parkinson's Disease (PD) using machine learning techniques with a two-step approach. In the first step, using only patient data, we aim to determine the patient group with relatively greater severity of speech impairments using Unified Parkinson's Disease Rating Scale (UPDRS) score as an index of disease progression. For this purpose, we use three supervised and two unsupervised learning techniques. In the second step, we exclude the samples of this group of patients from the dataset, create a new dataset consisting of the samples of PD patients having less severity of speech impairments and healthy subjects, and use three classifiers with various settings to address this binary classification problem. In this classification problem, the highest accuracy of 96.4% and Matthew's Correlation Coefficient of 0.77 is obtained using support vector machines with third-degree polynomial kernel showing that vocal features can be used to build a decision support system for early telediagnosis of PD.
Collapse
|
28
|
Rueda A, Krishnan S. Feature analysis of dysphonia speech for monitoring Parkinson's disease. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:2308-2311. [PMID: 29060359 DOI: 10.1109/embc.2017.8037317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Parkinson's disease (PD) is a progressive neurodegenerative disorder that has no known cure and no known prevention. Early detection is crucial in order to slow down the progress. In the past 10 years, interest in PD analysis has visibly increased. Speech impairment affects the majority of people with Parkinson's (PWP). New features and machine learning algorithms were proposed to help diagnose PD and to measure a patient's progress. Using sustained vowel /a/ recordings, we identified a more prominent set of Mel-Frequency Cepstral Coefficient (MFCC) and Intrinsic Mode Functions (IMF), and other parameters that can best represent the characteristics of Parkinson's dysphonia to assist with the diagnosis process. For higher quality audio signals, there is a visible difference in the higher MFCC coefficients, the wider spectrum bandwidth in the first four IMFs of PWP, and higher power intensity in the healthy subjects. We also found that even when the signals are downsampled into toll-quality, the distinguishable MFCC and IMF features were largely maintained. This enabled a whole possibility of providing telemedicine for PWP.
Collapse
|
29
|
A Machine Learning System for the Diagnosis of Parkinson’s Disease from Speech Signals and Its Application to Multiple Speech Signal Types. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2016. [DOI: 10.1007/s13369-016-2206-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
30
|
Prashanth R, Dutta Roy S, Mandal PK, Ghosh S. High-Accuracy Detection of Early Parkinson's Disease through Multimodal Features and Machine Learning. Int J Med Inform 2016; 90:13-21. [PMID: 27103193 DOI: 10.1016/j.ijmedinf.2016.03.001] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 02/04/2016] [Accepted: 03/01/2016] [Indexed: 11/27/2022]
Abstract
Early (or preclinical) diagnosis of Parkinson's disease (PD) is crucial for its early management as by the time manifestation of clinical symptoms occur, more than 60% of the dopaminergic neurons have already been lost. It is now established that there exists a premotor stage, before the start of these classic motor symptoms, characterized by a constellation of clinical features, mostly non-motor in nature such as Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and olfactory loss. In this paper, we use the non-motor features of RBD and olfactory loss, along with other significant biomarkers such as Cerebrospinal fluid (CSF) measurements and dopaminergic imaging markers from 183 healthy normal and 401 early PD subjects, as obtained from the Parkinson's Progression Markers Initiative (PPMI) database, to classify early PD subjects from normal using Naïve Bayes, Support Vector Machine (SVM), Boosted Trees and Random Forests classifiers. We observe that SVM classifier gave the best performance (96.40% accuracy, 97.03% sensitivity, 95.01% specificity, and 98.88% area under ROC). We infer from the study that a combination of non-motor, CSF and imaging markers may aid in the preclinical diagnosis of PD.
Collapse
Affiliation(s)
- R Prashanth
- Department of Electrical Engineering, Indian Institute of Technology, Delhi, India.
| | - Sumantra Dutta Roy
- Department of Electrical Engineering, Indian Institute of Technology, Delhi, India
| | - Pravat K Mandal
- Neuroimaging and Neurospectroscopy Laboratory, National Brain Research Centre, India; Department of Radiology, Johns Hopkins Medicine, MD, USA
| | - Shantanu Ghosh
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, MA, USA
| |
Collapse
|
31
|
Diagnosing Parkinson's Diseases Using Fuzzy Neural System. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:1267919. [PMID: 26881009 PMCID: PMC4736962 DOI: 10.1155/2016/1267919] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 12/14/2015] [Indexed: 11/25/2022]
Abstract
This study presents the design of the recognition system that will discriminate between healthy people and people with Parkinson's disease. A diagnosing of Parkinson's diseases is performed using fusion of the fuzzy system and neural networks. The structure and learning algorithms of the proposed fuzzy neural system (FNS) are presented. The approach described in this paper allows enhancing the capability of the designed system and efficiently distinguishing healthy individuals. It was proved through simulation of the system that has been performed using data obtained from UCI machine learning repository. A comparative study was carried out and the simulation results demonstrated that the proposed fuzzy neural system improves the recognition rate of the designed system.
Collapse
|
32
|
Sandström L, Hägglund P, Johansson L, Blomstedt P, Karlsson F. Speech intelligibility in Parkinson's disease patients with zona incerta deep brain stimulation. Brain Behav 2015; 5:e00394. [PMID: 26516614 PMCID: PMC4614054 DOI: 10.1002/brb3.394] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 07/06/2015] [Accepted: 08/16/2015] [Indexed: 11/08/2022] Open
Abstract
OBJECTIVES To investigate the effects of l-dopa (Levodopa) and cZi-DBS (deep brain stimulation in caudal zona incerta) on spontaneous speech intelligibility in patients with PD (Parkinson's disease). MATERIALS AND METHODS Spontaneous utterances were extracted from anechoic recordings from 11 patients with PD preoperatively (off and on l-dopa medication) and 6 and 12 months post bilateral cZi-DBS operation (off and on stimulation, with simultaneous l-dopa medication). Background noise with an amplitude corresponding to a clinical setting was added to the recordings. Intelligibility was assessed through a transcription task performed by 41 listeners in a randomized and blinded procedure. RESULTS A group-level worsening in spontaneous speech intelligibility was observed on cZi stimulation compared to off 6 months postoperatively (8 adverse, 1 positive, 2 no change). Twelve months postoperatively, adverse effects of cZi-DBS were not frequently observed (2 positive, 3 adverse, 6 no change). l-dopa administered preoperatively as part of the evaluation for DBS operation provided the overall best treatment outcome (1 adverse, 4 positive, 6 no change). CONCLUSIONS cZi-DBS was shown to have smaller negative effects when evaluated from spontaneous speech compared to speech effects reported previously. The previously reported reduction in word-level intelligibility 12 months postoperatively was not transferred to spontaneous speech for most patients. Reduced intelligibility due to cZi stimulation was much more prominent 6 months postoperatively than at 12 months.
Collapse
Affiliation(s)
- Linda Sandström
- Division of Speech and Language PathologyDepartment of Clinical SciencesUmeå UniversityUmeåSweden
| | - Patricia Hägglund
- Division of Speech and Language PathologyDepartment of Clinical SciencesUmeå UniversityUmeåSweden
| | - Louise Johansson
- Division of Speech and Language PathologyDepartment of Clinical SciencesUmeå UniversityUmeåSweden
| | - Patric Blomstedt
- Division of Clinical NeuroscienceDepartment of Pharmacology and Clinical NeuroscienceUmeå UniversityUmeåSweden
| | - Fredrik Karlsson
- Division of Speech and Language PathologyDepartment of Clinical SciencesUmeå UniversityUmeåSweden
| |
Collapse
|
33
|
Laaridh I, Fredouille C, Meunier C. Automatic Detection of Phone-Based Anomalies in Dysarthric Speech. ACM TRANSACTIONS ON ACCESSIBLE COMPUTING 2015. [DOI: 10.1145/2739050] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Perceptual evaluation is still the most common method in clinical practice for diagnosing and following the progression of the condition of people with speech disorders. Although a number of studies have addressed the acoustic analysis of speech productions exhibiting impairments, additional descriptive analysis is required to manage interperson variability, considering speakers with the same condition or across different conditions. In this context, this article investigates automatic speech processing approaches dedicated to the detection and localization of abnormal acoustic phenomena in speech signal produced by people with speech disorders. This automatic process aims at enhancing the manual investigation of human experts while at the same time reducing the extent of their intervention by calling their attention to specific parts of the speech considered as atypical from an acoustical point of view.
Two different approaches are proposed in this article. The first approach models only the normal speech, whereas the second models both normal and dysarthric speech. Both approaches are evaluated following two strategies: one consists of a strict phone comparison between a human annotation of abnormal phones and the automatic output, while the other uses a “one-phone delay” for the comparison.
The experimental evaluation of both approaches for the task of detecting acoustic anomalies was conducted on two different corpora composed of French dysarthric speakers and control speakers. These approaches obtain very encouraging results and their potential for clinical uses with different types of dysarthria and neurological diseases is quite promising.
Collapse
Affiliation(s)
- Imed Laaridh
- University of Avignon, CERI/LIA; University of Aix Marseille, France
| | | | - Christine Meunier
- University of Aix Marseille, CNRS, LPL UMR 7309, 13100, Aix-en-Provence, France
| |
Collapse
|