1
|
Massi MC, Gasperoni F, Ieva F, Paganoni AM. Feature selection for imbalanced data with deep sparse autoencoders ensemble. Stat Anal Data Min 2022. [DOI: 10.1002/sam.11567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Michela Carlotta Massi
- MOX Laboratory for Modeling and Scientific Computing, Department of Mathematics Politecnico di Milano Milano Italy
- CHDS ‐ Center for Health Data Science Human Technopole Milano Italy
| | | | - Francesca Ieva
- MOX Laboratory for Modeling and Scientific Computing, Department of Mathematics Politecnico di Milano Milano Italy
- CHDS ‐ Center for Health Data Science Human Technopole Milano Italy
| | - Anna Maria Paganoni
- MOX Laboratory for Modeling and Scientific Computing, Department of Mathematics Politecnico di Milano Milano Italy
- CHDS ‐ Center for Health Data Science Human Technopole Milano Italy
| |
Collapse
|
2
|
Chung CW, Hsiao TH, Huang CJ, Chen YJ, Chen HH, Lin CH, Chou SC, Chen TS, Chung YF, Yang HI, Chen YM. Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus. BioData Min 2021; 14:52. [PMID: 34895289 PMCID: PMC8666017 DOI: 10.1186/s13040-021-00284-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 11/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study's purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation.
Collapse
Affiliation(s)
- Chih-Wei Chung
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chih-Jen Huang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yen-Ju Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Hsin-Hua Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Seng-Cho Chou
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzer-Shyong Chen
- Department of Information Management, Tunghai University, Taichung, Taiwan
| | - Yu-Fang Chung
- Department of Electrical Engineering, Tunghai University, Taichung, Taiwan
| | - Hwai-I Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yi-Ming Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan.
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan.
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan.
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- College of Medicine, National Chung Hsing University, 40227, Taichung City, Taiwan.
| |
Collapse
|
3
|
Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep 2020; 20:80. [PMID: 33270183 DOI: 10.1007/s11892-020-01353-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. RECENT FINDINGS Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
Collapse
Affiliation(s)
- Sanjay Basu
- Center for Primary Care, Harvard Medical School, Boston, MA, USA.
- Research and Population Health, Collective Health, San Francisco, CA, USA.
- School of Public Health, Imperial College London, London, SW7, UK.
| | - Karl T Johnson
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Seth A Berkowitz
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|