1
|
Galal A, Moustafa A, Salama M. Transforming neurodegenerative disorder care with machine learning: Strategies and applications. Neuroscience 2025; 573:272-285. [PMID: 40120712 DOI: 10.1016/j.neuroscience.2025.03.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/05/2025] [Accepted: 03/17/2025] [Indexed: 03/25/2025]
Abstract
Neurodegenerative diseases (NDs), characterized by progressive neuronal degeneration and manifesting in diverse forms such as memory loss and movement disorders, pose significant challenges due to their complex molecular mechanisms and heterogeneous patient presentations. Diagnosis often relies heavily on clinical assessments and neuroimaging, with definitive confirmation frequently requiring post-mortem autopsy. However, the emergence of Artificial Intelligence (AI) and Machine Learning (ML) offers a transformative potential. These technologies can enable the development of non-invasive tools for early diagnosis, biomarker identification, personalized treatment strategies, patient subtyping and stratification, and disease risk prediction. This review aims to provide a starting point for researchers, both with and without clinical backgrounds, who are interested in applying ML to NDs. We will discuss available data resources for key diseases like Alzheimer's and Parkinson's, explore how ML can revolutionize neurodegenerative care, and emphasize the importance of integrating multiple high-dimensional data sources to gain deeper insights and inform effective therapeutic strategies.
Collapse
Affiliation(s)
- Aya Galal
- Systems Genomics Laboratory, American University in Cairo, New Cairo, Egypt; Institute of Global Health and Human Ecology, American University in Cairo, New Cairo, Egypt
| | - Ahmed Moustafa
- Systems Genomics Laboratory, American University in Cairo, New Cairo, Egypt; Institute of Global Health and Human Ecology, American University in Cairo, New Cairo, Egypt; Biology Department, American University in Cairo, New Cairo, Egypt
| | - Mohamed Salama
- Institute of Global Health and Human Ecology, American University in Cairo, New Cairo, Egypt; Global Brain Health Institute (GBHI), Trinity College Dublin, Dublin 2, Ireland; Faculty of Medicine, Mansoura University, El Mansura, Egypt.
| |
Collapse
|
2
|
Chang D. Vocal performance evaluation of the intelligent note recognition method based on deep learning. Sci Rep 2025; 15:13927. [PMID: 40263420 PMCID: PMC12015219 DOI: 10.1038/s41598-025-99357-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 04/18/2025] [Indexed: 04/24/2025] Open
Abstract
This study aims to optimize the ability of note recognition and improve the accuracy of vocal performance evaluation. Firstly, the basic theory of music is analyzed. Secondly, the convolutional neural network (CNN) in deep learning (DL) is selected to integrate gated recurrent units for optimization. Moreover, the attention mechanism is added to the optimized model to implement an intelligent note recognition model, and the results of note recognition are compared with those of common models. Finally, according to the results of audio signal classification, a vocal performance evaluation model optimized based on the attention mechanism is constructed. The accuracy of the model under different feature inputs is compared. The results indicate that different models show obvious differences in F-value, accuracy, precision, and recall. The attention mechanism-gated recurrent convolutional neural network (A-GRCNN) model performs best on all indicators. Specifically, this model's accuracy, recall, F-value, and precision reach 0.961, 0.958, 0.963, and 0.970. The incorporation of multiple feature inputs can remarkably enhance the accuracy of vocal performance evaluation, especially the combination of constant Q Transform features, which is the most outstanding. This study improves the accuracy and reliability of music information processing, promotes the application of DL technology in music, and contributes to optimizing vocal performance evaluation.
Collapse
Affiliation(s)
- Dongyun Chang
- School of Music, Qinghai Normal University, Xining, China.
| |
Collapse
|
3
|
Dudek M, Hemmerling D, Kaczmarska M, Stepien J, Daniol M, Wodzinski M, Wojcik-Pedziwiatr M. Analysis of Voice, Speech, and Language Biomarkers of Parkinson's Disease Collected in a Mixed Reality Setting. SENSORS (BASEL, SWITZERLAND) 2025; 25:2405. [PMID: 40285095 PMCID: PMC12031132 DOI: 10.3390/s25082405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 04/04/2025] [Accepted: 04/06/2025] [Indexed: 04/29/2025]
Abstract
This study explores an innovative approach to early Parkinson's disease (PD) detection by analyzing speech data collected using a mixed reality (MR) system. A total of 57 Polish participants, including PD patients and healthy controls, performed five speech tasks while using an MR head-mounted display (HMD). Speech data were recorded and analyzed to extract acoustic and linguistic features, which were then evaluated using machine learning models, including logistic regression, support vector machines (SVMs), random forests, AdaBoost, and XGBoost. The XGBoost model achieved the best performance, with an F1-score of 0.90 ± 0.05 in the story-retelling task. Key features such as MFCCs (mel-frequency cepstral coefficients), spectral characteristics, RASTA-filtered auditory spectrum, and local shimmer were identified as significant in detecting PD-related speech alterations. Additionally, state-of-the-art deep learning models (wav2vec2, HuBERT, and WavLM) were fine-tuned for PD detection. HuBERT achieved the highest performance, with an F1-score of 0.94 ± 0.04 in the diadochokinetic task, demonstrating the potential of deep learning to capture complex speech patterns linked to neurodegenerative diseases. This study highlights the effectiveness of combining MR technology for speech data collection with advanced machine learning (ML) and deep learning (DL) techniques, offering a non-invasive and high-precision approach to PD diagnosis. The findings hold promise for broader clinical applications, advancing the diagnostic landscape for neurodegenerative disorders.
Collapse
Affiliation(s)
- Milosz Dudek
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | - Daria Hemmerling
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | - Marta Kaczmarska
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | - Joanna Stepien
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | - Mateusz Daniol
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | - Marek Wodzinski
- Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland; (D.H.); (M.K.); (J.S.); (M.D.); (M.W.)
| | | |
Collapse
|
4
|
di Biase L, Pecoraro PM, Pecoraro G, Shah SA, Di Lazzaro V. Machine learning and wearable sensors for automated Parkinson's disease diagnosis aid: a systematic review. J Neurol 2024; 271:6452-6470. [PMID: 39143345 DOI: 10.1007/s00415-024-12611-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024]
Abstract
BACKGROUND The diagnosis of Parkinson's disease is currently based on clinical evaluation. Despite clinical hallmarks, unfortunately, the error rate is still significant. Low in-vivo diagnostic accuracy of clinical evaluation mainly relies on the lack of quantitative biomarkers for an objective motor performance assessment. Non-invasive technologies, such as wearable sensors, coupled with machine learning algorithms, assess quantitatively and objectively the motor performances, with possible benefits either for in-clinic and at-home settings. We conducted a systematic review of the literature on machine learning algorithms embedded in smart devices in Parkinson's disease diagnosis. METHODS Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we searched PubMed for articles published between December, 2007 and July, 2023, using a search string combining "Parkinson's disease" AND ("healthy" or "control") AND "diagnosis", within the Groups and Outcome domains. Additional search terms included "Algorithm", "Technology" and "Performance". RESULTS From 89 identified studies, 47 met the inclusion criteria based on the search string and four additional studies were included based on the Authors' expertise. Gait emerged as the most common parameter analysed by machine learning models, with Support Vector Machines as the prevalent algorithm. The results suggest promising accuracy with complex algorithms like Random Forest, Support Vector Machines, and K-Nearest Neighbours. DISCUSSION Despite the promise shown by machine learning algorithms, real-world applications may still face limitations. This review suggests that integrating machine learning with wearable sensors has the potential to improve Parkinson's disease diagnosis. These tools could provide clinicians with objective data, potentially aiding in earlier detection.
Collapse
Affiliation(s)
- Lazzaro di Biase
- Research Unit of Neurology, Neurophysiology and Neurobiology, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo 21, 00128, Rome, Italy.
- Operative Research Unit of Neurology, Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo 200, 00128, Rome, Italy.
- Brain Innovations Lab, Università Campus Bio-Medico di Roma, Via Álvaro del Portillo 21, 00128, Rome, Italy.
| | - Pasquale Maria Pecoraro
- Research Unit of Neurology, Neurophysiology and Neurobiology, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo 21, 00128, Rome, Italy
- Operative Research Unit of Neurology, Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo 200, 00128, Rome, Italy
| | | | | | - Vincenzo Di Lazzaro
- Research Unit of Neurology, Neurophysiology and Neurobiology, Department of Medicine and Surgery, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo 21, 00128, Rome, Italy
- Operative Research Unit of Neurology, Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo 200, 00128, Rome, Italy
| |
Collapse
|
5
|
Evangelista EG, Bélisle-Pipon JC, Naunheim MR, Powell M, Gallois H, Bensoussan Y. Voice as a Biomarker in Health-Tech: Mapping the Evolving Landscape of Voice Biomarkers in the Start-Up World. Otolaryngol Head Neck Surg 2024; 171:340-352. [PMID: 38822764 DOI: 10.1002/ohn.830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/10/2024] [Accepted: 02/24/2024] [Indexed: 06/03/2024]
Abstract
OBJECTIVE The vocal biomarkers market was worth $1.9B in 2021 and is projected to exceed $5.1B by 2028, for a compound annual growth rate of 15.15%. The investment growth demonstrates a blossoming interest in voice and artificial intelligence (AI) as it relates to human health. The objective of this study was to map the current landscape of start-ups utilizing voice as a biomarker in health-tech. DATA SOURCES A comprehensive search for start-ups was conducted using Google, LinkedIn, Twitter, and Facebook. A review of the research was performed using company website, PubMed, and Google Scholar. REVIEW METHODS A 3-pronged approach was taken to thoroughly map the landscape. First, an internet search was conducted to identify current start-ups focusing on products relating to voice as a biomarker of health. Second, Crunchbase was utilized to collect financial and organizational information. Third, a review of the literature was conducted to analyze publications associated with the identified start-ups. RESULTS A total of 27 start-up start-ups with a focus in the utilization of AI for developing biomarkers of health from the human voice were identified. Twenty-four of these start-ups garnered $178,808,039 in investments. The 27 start-ups published 194 publications combined, 128 (66%) of which were peer reviewed. CONCLUSION There is growing enthusiasm surrounding voice as a biomarker in health-tech. Academic drive may complement commercialization to best achieve progress in this arena. More research is needed to accurately capture the entirety of the field, including larger industry players, academic institutions, and non-English content.
Collapse
Affiliation(s)
- Emily G Evangelista
- University of South Florida Morsani College of Medicine, Tampa, Florida, USA
| | | | - Matthew R Naunheim
- Division of Laryngology, Otolaryngology-Head and Neck Surgery, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| | - Maria Powell
- Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Hortense Gallois
- Department of Bio-ethics, Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Yael Bensoussan
- Division of Laryngology, Department of Otolaryngology-Head and Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida, USA
| |
Collapse
|
6
|
Jeong SM, Kim S, Lee EC, Kim HJ. Exploring Spectrogram-Based Audio Classification for Parkinson's Disease: A Study on Speech Classification and Qualitative Reliability Verification. SENSORS (BASEL, SWITZERLAND) 2024; 24:4625. [PMID: 39066023 PMCID: PMC11280556 DOI: 10.3390/s24144625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 07/15/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Patients suffering from Parkinson's disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson's patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson's through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson's speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson's through speech using two different types of models but also validated the predictions of the model in practice.
Collapse
Affiliation(s)
- Seung-Min Jeong
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea; (S.-M.J.); (S.K.)
| | - Seunghyun Kim
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea; (S.-M.J.); (S.K.)
| | - Eui Chul Lee
- Department of Human-Centered Artificial Intelligence, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea
| | - Han Joon Kim
- Department of Neurology, Seoul National University College of Medicine, Seoul National University Hospital, Daehak-ro 101, Jongno-gu, Seoul 03080, Republic of Korea
| |
Collapse
|
7
|
Abumalloh RA, Nilashi M, Samad S, Ahmadi H, Alghamdi A, Alrizq M, Alyami S. Parkinson's disease diagnosis using deep learning: A bibliometric analysis and literature review. Ageing Res Rev 2024; 96:102285. [PMID: 38554785 DOI: 10.1016/j.arr.2024.102285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 03/20/2024] [Accepted: 03/24/2024] [Indexed: 04/02/2024]
Abstract
Parkinson's Disease (PD) is a progressive neurodegenerative illness triggered by decreased dopamine secretion. Deep Learning (DL) has gained substantial attention in PD diagnosis research, with an increase in the number of published papers in this discipline. PD detection using DL has presented more promising outcomes as compared with common machine learning approaches. This article aims to conduct a bibliometric analysis and a literature review focusing on the prominent developments taking place in this area. To achieve the target of the study, we retrieved and analyzed the available research papers in the Scopus database. Following that, we conducted a bibliometric analysis to inspect the structure of keywords, authors, and countries in the surveyed studies by providing visual representations of the bibliometric data using VOSviewer software. The study also provides an in-depth review of the literature focusing on different indicators of PD, deployed approaches, and performance metrics. The outcomes indicate the firm development of PD diagnosis using DL approaches over time and a large diversity of studies worldwide. Additionally, the literature review presented a research gap in DL approaches related to incremental learning, particularly in relation to big data analysis.
Collapse
Affiliation(s)
- Rabab Ali Abumalloh
- Department of Computer Science and Engineering, Qatar University, Doha 2713, Qatar
| | - Mehrbakhsh Nilashi
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam; School of Computer Science, Duy Tan University, Da Nang, Vietnam; UCSI Graduate Business School, UCSI University, No. 1 Jalan Menara Gading, UCSI Heights, Cheras, Kuala Lumpur 56000, Malaysia; Centre for Global Sustainability Studies (CGSS), Universiti Sains Malaysia, Penang 11800, Malaysia.
| | - Sarminah Samad
- Faculty of Business, UNITAR International University, Tierra Crest, Jalan SS6/3, Petaling Jaya, Selangor 47301, Malaysia
| | - Hossein Ahmadi
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth PL4 8AA, UK
| | - Abdullah Alghamdi
- Information Systems Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia; AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia
| | - Mesfer Alrizq
- Information Systems Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia; AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia
| | - Sultan Alyami
- AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia; Computer Science Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| |
Collapse
|
8
|
O' Shea E, Rukundo A, Foley G, Wilkinson T, Timmons S. Experiences of health service access: A qualitative interview study of people living with Parkinson's disease in Ireland. Health Expect 2024; 27:e13901. [PMID: 37926923 PMCID: PMC10726277 DOI: 10.1111/hex.13901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/26/2023] [Accepted: 10/17/2023] [Indexed: 11/07/2023] Open
Abstract
BACKGROUND People with Parkinson's disease (PD) do not always access specialist outpatient services in a timely manner in Ireland. The perspectives of people living with PD, relating to service access, are largely absent in the existing literature. AIM To explore experiences of PD service access for people living with PD, using a qualitative approach. METHODS Purposive maximum variation sampling was used. Semi-structured telephone interviews were conducted with 25 service users, including people with PD (n = 22) and supporting carers (n = 3). Informed consent was obtained from all participants. Interviews ranged in duration from 30 to 90 min. Data were managed in NVivo 12 and interpreted inductively using thematic analysis. The researchers were reflexive throughout the research process. The Consolidated Criteria for Reporting Qualitative Research checklist was employed to maximise transparency. RESULTS The findings highlight several key barriers to and facilitators of equitable and timely service access. Three key themes were identified comprising experiences of PD service access including 'geographical inequity', 'discriminatory practices', and 'public and private system deficits'. Together, these themes illustrate how a two-tiered and under-resourced health system lacks capacity, in terms of infrastructure and workforce, to meet PD needs for both public and private patients in Ireland. CONCLUSIONS These findings point to problems for PD care, relating to (i) how the health system is structured, (ii) the under-provision and under-resourcing of specialist outpatient PD services, including medical, nursing, and multidisciplinary posts, and (iii) insufficient PD awareness education and training across health settings. The findings also show that telemedicine can provide opportunities for making access to certain aspects of PD care more flexible and equitable, but the feasibility and acceptability of technology-enabled care must be assessed on an individual basis. Implications for policy, practice and research are discussed. PATIENT OR PUBLIC CONTRIBUTION The design and conduct of this study were supported by an expert advisory group (EAG) of 10 co-researchers living with PD. The EAG reviewed the interview schedule and the protocol for this study and provided detailed feedback from their perspective, to improve the methods, including the interview approach. The group also reviewed the findings of the study and contributed their insights on the meaning of the findings, which fed into this paper.
Collapse
Affiliation(s)
- Emma O' Shea
- Centre for Gerontology and Rehabilitation, School of MedicineUniversity College CorkCorkIreland
| | - Aphie Rukundo
- Centre for Gerontology and Rehabilitation, School of MedicineUniversity College CorkCorkIreland
| | - Geraldine Foley
- Discipline of Occupational Therapy, School of MedicineTrinity College DublinDublinIreland
| | - Tony Wilkinson
- Cork Parkinson's AssociationParkinson's Association of IrelandDublinIreland
| | - Suzanne Timmons
- Centre for Gerontology and Rehabilitation, School of MedicineUniversity College CorkCorkIreland
| |
Collapse
|
9
|
Ceylan ME, Cangi ME, Yılmaz G, Peru BS, Yiğit Ö. Are smartphones and low-cost external microphones comparable for measuring time-domain acoustic parameters? Eur Arch Otorhinolaryngol 2023; 280:5433-5444. [PMID: 37584753 DOI: 10.1007/s00405-023-08179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/05/2023] [Indexed: 08/17/2023]
Abstract
PURPOSE This study examined and compared the diagnostic accuracy and correlation levels of the acoustic parameters of the audio recordings obtained from smartphones on two operating systems and from dynamic and condenser types of external microphones. METHOD The study included 87 adults: 57 with voice disorder and 30 with a healthy voice. Each participant was asked to perform a sustained vowel phonation (/a/). The recordings were taken simultaneously using five microphones AKG-P220, Shure-SM58, Samson Go Mic, Apple iPhone 6, and Samsung Galaxy J7 Pro microphones in an acoustically insulated cabinet. Acoustic examinations were performed using Praat version 6.2.09. The data were examined using Pearson correlation and receiver-operating characteristic (ROC) analyses. RESULTS The parameters with the highest area under curve (AUC) values among all microphone recordings in the time-domain analyses were the frequency perturbation parameters. Additionally, considering the correlation coefficients obtained by synchronizing the microphones with each other and the AUC values together, the parameter with the highest correlation coefficient and diagnostic accuracy values was the jitter-local parameter. CONCLUSION Period-to-period perturbation parameters obtained from audio recordings made with smartphones show similar levels of diagnostic accuracy to external microphones used in clinical conditions.
Collapse
Affiliation(s)
- M Enes Ceylan
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - M Emrah Cangi
- University of Health Sciences, Speech and Language Therapy, Selimiye, Tıbbiye Cd No: 38, Istanbul, 34668, Üsküdar, Türkiye.
| | - Göksu Yılmaz
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Beyza Sena Peru
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Özgür Yiğit
- Istanbul Şişli Hamidiye Etfal Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
10
|
Kaufman JM, Thommandram A, Fossat Y. Acoustic Analysis and Prediction of Type 2 Diabetes Mellitus Using Smartphone-Recorded Voice Segments. MAYO CLINIC PROCEEDINGS. DIGITAL HEALTH 2023; 1:534-544. [PMID: 40206319 PMCID: PMC11975753 DOI: 10.1016/j.mcpdig.2023.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
Objective To investigate the potential of voice analysis as a prescreening or monitoring tool for type 2 diabetes mellitus (T2DM) by examining the differences in voice recordings between nondiabetic and T2DM individuals. Patients and Methods Total 267 participants diagnosed as nondiabetic (79 women and 113 men) or T2DM (18 women and 57 men) on the basis of American Diabetes Association guidelines were recruited in India between August 30, 2021 and June 30, 2022. Using a smartphone application, participants recorded a fixed phrase up to 6 times daily for 2 weeks, resulting in 18,465 recordings. Fourteen acoustic features were extracted from each recording to analyze differences between nondiabetic and T2DM individuals and create a prediction methodology for T2DM status. Results Significant differences were found between voice recordings of nondiabetic and T2DM men and women, both in the entire dataset and in an age-matched and body mass index (BMI [calculated as the weight in kilograms divided by the height in meters squared])-matched sample. The highest predictive accuracy was achieved by pitch (P<.0001), pitch SD (P<.0001), and relative average pertubation jitter (P=.02) for women, and intensity (P<.0001) and 11-point amplitude perturbation quotient shimmer (apq11, P<0.0001) for men. Incorporating these features with age and BMI, the optimal prediction models achieved accuracies of 0.75±0.22 for women and 0.70±0.10 for men through 5-fold cross-validation in the age-matched and BMI-matched sample. Conclusion Overall, vocal changes occur in individuals with T2DM compared with those without T2DM. Voice analysis shows potential as a prescreening or monitoring tool for T2DM, particularly when combined with other risk factors associated with the condition. Trial Registration clinicaltrials.gov Identifier: CTRI/2021/08/035957.
Collapse
Affiliation(s)
| | | | - Yan Fossat
- Klick Applied Sciences, Klick Inc, Toronto, Canada
- Faculty of Science, Ontario Tech University, Oshawa, Canada
| |
Collapse
|
11
|
Skibińska J, Hosek J. Computerized analysis of hypomimia and hypokinetic dysarthria for improved diagnosis of Parkinson's disease. Heliyon 2023; 9:e21175. [PMID: 37908703 PMCID: PMC10613914 DOI: 10.1016/j.heliyon.2023.e21175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/07/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open
Abstract
Background and Objective An aging society requires easy-to-use approaches for diagnosis and monitoring of neurodegenerative disorders, such as Parkinson's disease (PD), so that clinicians can effectively adjust a treatment policy and improve patients' quality of life. Current methods of PD diagnosis and monitoring usually require the patients to come to a hospital, where they undergo several neurological and neuropsychological examinations. These examinations are usually time-consuming, expensive, and performed just a few times per year. Hence, this study explores the possibility of fusing computerized analysis of hypomimia and hypokinetic dysarthria (two motor symptoms manifested in the majority of PD patients) with the goal of proposing a new methodology of PD diagnosis that could be easily integrated into mHealth systems. Methods We enrolled 73 PD patients and 46 age- and gender-matched healthy controls, who performed several speech/voice tasks while recorded by a microphone and a camera. Acoustic signals were parametrized in the fields of phonation, articulation and prosody. Video recordings of a face were analyzed in terms of facial landmarks movement. Both modalities were consequently modeled by the XGBoost algorithm. Results The acoustic analysis enabled diagnosis of PD with 77% balanced accuracy, while in the case of the facial analysis, we observed 81% balanced accuracy. The fusion of both modalities increased the balanced accuracy to 83% (88% sensitivity and 78% specificity). The most informative speech exercise in the multimodality system turned out to be a tongue twister. Additionally, we identified muscle movements that are characteristic of hypomimia. Conclusions The introduced methodology, which is based on the myriad of speech exercises likewise audio and video modality, allows for the detection of PD with an accuracy of up to 83%. The speech exercise - tongue twisters occurred to be the most valuable from the clinical point of view. Additionally, the clinical interpretation of the created models is illustrated. The presented computer-supported methodology could serve as an extra tool for neurologists in PD detection and the proposed potential solution of mHealth will facilitate the patient's and doctor's life.
Collapse
Affiliation(s)
- Justyna Skibińska
- Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, Brno, 61600, Czechia
- Unit of Electrical Engineering, Tampere University, Kalevantie 4, Tampere, 33100, Finland
| | - Jiri Hosek
- Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, Brno, 61600, Czechia
| |
Collapse
|
12
|
Suppa A, Costantini G, Gomez-Vilda P, Saggio G. Editorial: Voice analysis in healthy subjects and patients with neurologic disorders. Front Neurol 2023; 14:1288370. [PMID: 37840929 PMCID: PMC10569294 DOI: 10.3389/fneur.2023.1288370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 09/15/2023] [Indexed: 10/17/2023] Open
Affiliation(s)
- Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Pedro Gomez-Vilda
- Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
13
|
Scimeca S, Amato F, Olmo G, Asci F, Suppa A, Costantini G, Saggio G. Robust and language-independent acoustic features in Parkinson's disease. Front Neurol 2023; 14:1198058. [PMID: 37384279 PMCID: PMC10294689 DOI: 10.3389/fneur.2023.1198058] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction The analysis of vocal samples from patients with Parkinson's disease (PDP) can be relevant in supporting early diagnosis and disease monitoring. Intriguingly, speech analysis embeds several complexities influenced by speaker characteristics (e.g., gender and language) and recording conditions (e.g., professional microphones or smartphones, supervised, or non-supervised data collection). Moreover, the set of vocal tasks performed, such as sustained phonation, reading text, or monologue, strongly affects the speech dimension investigated, the feature extracted, and, as a consequence, the performance of the overall algorithm. Methods We employed six datasets, including a cohort of 176 Healthy Control (HC) participants and 178 PDP from different nationalities (i.e., Italian, Spanish, Czech), recorded in variable scenarios through various devices (i.e., professional microphones and smartphones), and performing several speech exercises (i.e., vowel phonation, sentence repetition). Aiming to identify the effectiveness of different vocal tasks and the trustworthiness of features independent of external co-factors such as language, gender, and data collection modality, we performed several intra- and inter-corpora statistical analyses. In addition, we compared the performance of different feature selection and classification models to evaluate the most robust and performing pipeline. Results According to our results, the combined use of sustained phonation and sentence repetition should be preferred over a single exercise. As for the set of features, the Mel Frequency Cepstral Coefficients demonstrated to be among the most effective parameters in discriminating between HC and PDP, also in the presence of heterogeneous languages and acquisition techniques. Conclusion Even though preliminary, the results of this work can be exploited to define a speech protocol that can effectively capture vocal alterations while minimizing the effort required to the patient. Moreover, the statistical analysis identified a set of features minimally dependent on gender, language, and recording modalities. This discloses the feasibility of extensive cross-corpora tests to develop robust and reliable tools for disease monitoring and staging and PDP follow-up.
Collapse
Affiliation(s)
- Sabrina Scimeca
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Gabriella Olmo
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Francesco Asci
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
| | - Antonio Suppa
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
14
|
Costantini G, Cesarini V, Brenna E. High-Level CNN and Machine Learning Methods for Speaker Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:3461. [PMID: 37050521 PMCID: PMC10098737 DOI: 10.3390/s23073461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/20/2023] [Accepted: 03/22/2023] [Indexed: 06/19/2023]
Abstract
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally different methodologies such as Deep Learning or "traditional" Machine Learning (ML). In this paper, we compared and explored the two methodologies on the DEMoS dataset consisting of 8869 audio files of 58 speakers in different emotional states. A custom CNN is compared to several pre-trained nets using image inputs of spectrograms and Cepstral-temporal (MFCC) graphs. AML approach based on acoustic feature extraction, selection and multi-class classification by means of a Naïve Bayes model is also considered. Results show how a custom, less deep CNN trained on grayscale spectrogram images obtain the most accurate results, 90.15% on grayscale spectrograms and 83.17% on colored MFCC. AlexNet provides comparable results, reaching 89.28% on spectrograms and 83.43% on MFCC.The Naïve Bayes classifier provides a 87.09% accuracy and a 0.985 average AUC while being faster to train and more interpretable. Feature selection shows how F0, MFCC and voicing-related features are the most characterizing for this SR task. The high amount of training samples and the emotional content of the DEMoS dataset better reflect a real case scenario for speaker recognition, and account for the generalization power of the models.
Collapse
|