1
|
Di Cesare MG, Perpetuini D, Cardone D, Merla A. Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:1499. [PMID: 38475034 DOI: 10.3390/s24051499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King's College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life.
Collapse
Affiliation(s)
| | - David Perpetuini
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Daniela Cardone
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| |
Collapse
|
2
|
Uloza V, Pribuišis K, Ulozaite-Staniene N, Petrauskas T, Damaševičius R, Maskeliūnas R. Accuracy Analysis of the Multiparametric Acoustic Voice Indices, the VWI, AVQI, ABI, and DSI Measures, in Differentiating between Normal and Dysphonic Voices. J Clin Med 2023; 13:99. [PMID: 38202106 PMCID: PMC10779457 DOI: 10.3390/jcm13010099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/12/2024] Open
Abstract
The study aimed to investigate and compare the accuracy and robustness of the multiparametric acoustic voice indices (MAVIs), namely the Dysphonia Severity Index (DSI), Acoustic Voice Quality Index (AVQI), Acoustic Breathiness Index (ABI), and Voice Wellness Index (VWI) measures in differentiating normal and dysphonic voices. The study group consisted of 129 adult individuals including 49 with normal voices and 80 patients with pathological voices. The diagnostic accuracy of the investigated MAVI in differentiating between normal and pathological voices was assessed using receiver operating characteristics (ROC). Moderate to strong positive linear correlations were observed between different MAVIs. The ROC statistical analysis revealed that all used measurements manifested in a high level of accuracy (area under the curve (AUC) of 0.80 and greater) and an acceptable level of sensitivity and specificity in discriminating between normal and pathological voices. However, with AUC 0.99, the VWI demonstrated the highest diagnostic accuracy. The highest Youden index equaled 0.93, revealing that a VWI cut-off of 4.45 corresponds with highly acceptable sensitivity (97.50%) and specificity (95.92%). In conclusion, the VWI was found to be beneficial in describing differences in voice quality status and discriminating between normal and dysphonic voices based on clinical diagnosis, i.e., dysphonia type, implying the VWI's reliable voice screening potential.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (V.U.); (K.P.); (T.P.)
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (V.U.); (K.P.); (T.P.)
| | - Nora Ulozaite-Staniene
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (V.U.); (K.P.); (T.P.)
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (V.U.); (K.P.); (T.P.)
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| |
Collapse
|
3
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Ulozienė I, Blažauskas T, Damaševičius R, Maskeliūnas R. Smartphone-Based Voice Wellness Index Application for Dysphonia Screening and Assessment: Development and Reliability. J Voice 2023:S0892-1997(23)00330-2. [PMID: 37980209 DOI: 10.1016/j.jvoice.2023.10.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/12/2023] [Accepted: 10/12/2023] [Indexed: 11/20/2023]
Abstract
OBJECTIVE This study aimed to develop a Voice Wellness Index (VWI) application combining the acoustic voice quality index (AVQI) and glottal function index (GFI) data and to evaluate its reliability in quantitative voice assessment and normal versus pathological voice differentiation. STUDY DESIGN Cross-sectional study. METHODS A total of 135 adult participants (86 patients with voice disorders and 49 patients with normal voices) were included in this study. Five iOS and Android smartphones with the "Voice Wellness Index" app installed were used to estimate VWI. The VWI data obtained using smartphones were compared with VWI measurements computed from voice recordings collected from a reference studio microphone. The diagnostic efficacy of VWI in differentiating between normal and disordered voices was assessed using receiver operating characteristics (ROC). RESULTS With a Cronbach's alpha of 0.972 and an ICC of 0.972 (0.964-0.979), the VWI scores of the individual smartphones demonstrated remarkable inter-smartphone agreement and reliability. The VWI data obtained from different smartphones and a studio microphone showed nearly perfect direct linear correlations (r = 0.993-0.998). Depending on the individual smartphone device used, the cutoff scores of VWI related to differentiating between normal and pathological voice groups were calculated as 5.6-6.0 with the best balance between sensitivity (94.10-95.15%) and specificity (93.68-95.72%), The diagnostic accuracy was excellent in all cases, with an area under the curve (AUC) of 0.970-0.974. CONCLUSION The "Voice Wellness Index" application is an accurate and reliable tool for voice quality measurement and normal versus pathological voice screening and has considerable potential to be used by healthcare professionals and patients for voice assessment.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Ingrida Ulozienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| |
Collapse
|
4
|
Sol J, Aaen M, Sadolin C, Ten Bosch L. Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier. J Voice 2023:S0892-1997(23)00281-3. [PMID: 37953088 DOI: 10.1016/j.jvoice.2023.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/07/2023] [Indexed: 11/14/2023]
Abstract
Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.
Collapse
Affiliation(s)
- Jeroen Sol
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, the Netherlands
| | - Mathias Aaen
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark; Nottingham University Hospitals, NHS Trust, Queen's Medical, ENT Department, Nottingham, United Kingdom.
| | - Cathrine Sadolin
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark
| | - Louis Ten Bosch
- Department of Language and Communication, Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
5
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Blažauskas T, Damaševičius R, Maskeliūnas R. Reliability of Universal-Platform-Based Voice Screen Application in AVQI Measurements Captured with Different Smartphones. J Clin Med 2023; 12:4119. [PMID: 37373811 DOI: 10.3390/jcm12124119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/15/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
The aim of the study was to develop a universal-platform-based (UPB) application suitable for different smartphones for estimation of the Acoustic Voice Quality Index (AVQI) and evaluate its reliability in AVQI measurements and normal and pathological voice differentiation. Our study group consisted of 135 adult individuals, including 49 with normal voices and 86 patients with pathological voices. The developed UPB "Voice Screen" application installed on five iOS and Android smartphones was used for AVQI estimation. The AVQI measures calculated from voice recordings obtained from a reference studio microphone were compared with AVQI results obtained using smartphones. The diagnostic accuracy of differentiating normal and pathological voices was evaluated by applying receiver-operating characteristics. One-way ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using a studio microphone and different smartphones (F = 0.759; p = 0.58). Almost perfect direct linear correlations (r = 0.991-0.987) were observed between the AVQI results obtained with a studio microphone and different smartphones. An acceptable level of precision of the AVQI in discriminating between normal and pathological voices was yielded, with areas under the curve (AUC) displaying 0.834-0.862. There were no statistically significant differences between the AUCs (p > 0.05) obtained from studio and smartphones' microphones. The significant difference revealed between the AUCs was only 0.028. The UPB "Voice Screen" application represented an accurate and robust tool for voice quality measurements and normal vs. pathological voice screening purposes, demonstrating the potential to be used by patients and clinicians for voice assessment, employing both iOS and Android smartphones.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| |
Collapse
|