1
|
Ouwerkerk J, Feleus S, van der Zwaan KF, Li Y, Roos M, van Roon-Mom WMC, de Bot ST, Wolstencroft KJ, Mina E. Machine learning in Huntington's disease: exploring the Enroll-HD dataset for prognosis and driving capability prediction. Orphanet J Rare Dis 2023; 18:218. [PMID: 37501188 PMCID: PMC10375780 DOI: 10.1186/s13023-023-02785-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/18/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND In biomedicine, machine learning (ML) has proven beneficial for the prognosis and diagnosis of different diseases, including cancer and neurodegenerative disorders. For rare diseases, however, the requirement for large datasets often prevents this approach. Huntington's disease (HD) is a rare neurodegenerative disorder caused by a CAG repeat expansion in the coding region of the huntingtin gene. The world's largest observational study for HD, Enroll-HD, describes over 21,000 participants. As such, Enroll-HD is amenable to ML methods. In this study, we pre-processed and imputed Enroll-HD with ML methods to maximise the inclusion of participants and variables. With this dataset we developed models to improve the prediction of the age at onset (AAO) and compared it to the well-established Langbehn formula. In addition, we used recurrent neural networks (RNNs) to demonstrate the utility of ML methods for longitudinal datasets, assessing driving capabilities by learning from previous participant assessments. RESULTS Simple pre-processing imputed around 42% of missing values in Enroll-HD. Also, 167 variables were retained as a result of imputing with ML. We found that multiple ML models were able to outperform the Langbehn formula. The best ML model (light gradient boosting machine) improved the prognosis of AAO compared to the Langbehn formula by 9.2%, based on root mean squared error in the test set. In addition, our ML model provides more accurate prognosis for a wider CAG repeat range compared to the Langbehn formula. Driving capability was predicted with an accuracy of 85.2%. The resulting pre-processing workflow and code to train the ML models are available to be used for related HD predictions at: https://github.com/JasperO98/hdml/tree/main . CONCLUSIONS Our pre-processing workflow made it possible to resolve the missing values and include most participants and variables in Enroll-HD. We show the added value of a ML approach, which improved AAO predictions and allowed for the development of an advisory model that can assist clinicians and participants in estimating future driving capability.
Collapse
Affiliation(s)
- Jasper Ouwerkerk
- Department of Pathology and Clinical Bioinformatics, Erasmus Medical Center (EMC), Wytemaweg, 3015 CN, Rotterdam, The Netherlands
| | - Stephanie Feleus
- Department of Neurology, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Kasper F van der Zwaan
- Department of Neurology, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Yunlei Li
- Department of Pathology and Clinical Bioinformatics, Erasmus Medical Center (EMC), Wytemaweg, 3015 CN, Rotterdam, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Willeke M C van Roon-Mom
- Department of Human Genetics, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Susanne T de Bot
- Department of Neurology, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Katherine J Wolstencroft
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Eleni Mina
- Department of Human Genetics, Leiden University Medical Center (LUMC), PO Box 9600, 2300 RC, Leiden, The Netherlands.
| |
Collapse
|
2
|
Prediction of Hypertension Outcomes Based on Gain Sequence Forward Tabu Search Feature Selection and XGBoost. Diagnostics (Basel) 2021; 11:diagnostics11050792. [PMID: 33925766 PMCID: PMC8146551 DOI: 10.3390/diagnostics11050792] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 04/23/2021] [Accepted: 04/26/2021] [Indexed: 01/30/2023] Open
Abstract
For patients with hypertension, serious complications, such as myocardial infarction, a common cause of heart failure, occurs in the late stage of hypertension. Hypertension outcomes can lead to complications, including death. Hypertension outcomes threaten patients’ lives and need to be predicted. In our research, we reviewed the hypertension medical data from a tertiary-grade A class hospital in Beijing, and established a hypertension outcome prediction model with the machine learning theory. We first proposed a gain sequence forward tabu search feature selection (GSFTS-FS) method, which can search the optimal combination of medical variables that affect hypertension outcomes. Based on this, the XGBoost algorithm established a prediction model because of its good stability. We verified the proposed method by comparing other commonly used models in similar works. The proposed GSFTS-FS improved the performance by about 10%. The proposed prediction method has the best performance and its AUC value, accuracy, F1 value, and recall of 10-fold cross-validation were 0.96. 0.95, 0.88, and 0.82, respectively. It also performed well on test datasets with 0.92, 0.94, 0.87, and 0.80 for AUC, accuracy, F1, and recall, respectively. Therefore, the XGBoost with GSFTS-FS can accurately and effectively predict the occurrence of outcomes for patients with hypertension, and can provide guidance for doctors in clinical diagnoses and medical decision-making.
Collapse
|