1
|
Al-Hussaini I, White B, Varmeziar A, Mehra N, Sanchez M, Lee J, DeGroote NP, Miller TP, Mitchell CS. An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia. J Clin Med 2024; 13:1788. [PMID: 38542012 PMCID: PMC10970787 DOI: 10.3390/jcm13061788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/18/2024] Open
Abstract
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.
Collapse
Affiliation(s)
- Irfan Al-Hussaini
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Brandon White
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Armon Varmeziar
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Nidhi Mehra
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Milagro Sanchez
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Judy Lee
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Nicholas P. DeGroote
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Tamara P. Miller
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
2
|
Perumalraja R, Felcia Logan's Deshna B, Swetha N. Statistical performance review on diagnosis of leukemia, glaucoma and diabetes mellitus using AI. Stat Med 2024; 43:1227-1237. [PMID: 38247116 DOI: 10.1002/sim.10004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/23/2024]
Abstract
The growth of artificial intelligence (AI) in the healthcare industry tremendously increases the patient outcomes by reshaping the way we diagnose, treat and monitor patients. AI-based innovation in healthcare include exploration of drugs, personalized medicine, clinical diagnosis investigations, robotic-assisted surgery, verified prescriptions, pregnancy care for women, radiology, and reviewed patient information analytics. However, prediction of AI-based solutions are depends mainly on the implementation of statistical algorithms and input data set. In this article, statistical performance review on various algorithms, Accuracy, Precision, Recall and F1-Score used to predict the diagnosis of leukemia, glaucoma, and diabetes mellitus is presented. Review on statistical algorithms' performance, used for individual disease diagnosis gives a complete picture of various research efforts during the last two decades. At the end of statistical review on each disease diagnosis, we have discussed our inferences that will give future directions for the new researchers on selection of AI statistical algorithm as well as the input data set.
Collapse
Affiliation(s)
- Rengaraju Perumalraja
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| | - B Felcia Logan's Deshna
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| | - N Swetha
- Department of Information Technology, Velammal College of Engineering and Technology, Madurai, India
| |
Collapse
|
3
|
Ghahramani Almanghadim H, Karimi B, Poursalehi N, Sanavandi M, Atefi Pourfardin S, Ghaedi K. The biological role of lncRNAs in the acute lymphocytic leukemia: An updated review. Gene 2024; 898:148074. [PMID: 38104953 DOI: 10.1016/j.gene.2023.148074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 11/29/2023] [Accepted: 12/08/2023] [Indexed: 12/19/2023]
Abstract
The cause of leukemia, a common malignancy of the hematological system, is unknown. The structure of long non-coding RNAs (lncRNAs) is similar to mRNA but no ability to encode proteins. Numerous malignancies, including different forms of leukemia, are linked to Lnc-RNAs. It is verified that the carcinogenesis and growth of a variety of human malignancies are significantly influenced by aberrant lncRNA expression. The body of evidence linking various types of lncRNAs to the etiology of leukemia has dramatically increased during the past ten years. Some lncRNAs are therefore anticipated to function as novel therapeutic targets, diagnostic biomarkers, and clinical outcome predictions. Additionally, these lncRNAs may provide new therapeutic options and insight into the pathophysiology of diseases, particularly leukemia. Thus, this review outlines the present comprehension of leukemia-associated lncRNAs.
Collapse
Affiliation(s)
| | - Bahareh Karimi
- Department of Cellular and Molecular Biology and Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Negareh Poursalehi
- Department of Medical Biotechnology, School of Medicine Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | | | | | - Kamran Ghaedi
- Department of Cell and Molecular Biology and Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Hezar Jerib Ave., Azadi Sq., 81746-73441 Isfahan, Iran.
| |
Collapse
|
4
|
Misra SC, Mukhopadhyay K. Data harnessing to nurture the human mind for a tailored approach to the child. Pediatr Res 2023; 93:357-65. [PMID: 36180585 DOI: 10.1038/s41390-022-02320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 07/06/2022] [Accepted: 09/12/2022] [Indexed: 11/08/2022]
Abstract
Big data in pediatrics is an ocean of structured and unstructured data. Big data analysis helps to dive into the ocean of data to filter out information that can guide pediatricians in their decision making, precision diagnosis, and targeted therapy. In addition, big data and its analysis have helped in the surveillance, prevention, and performance of the health system. There has been a considerable amount of work in pediatrics that we have tried to highlight in this review and some of it has been already incorporated into the health system. Work in specialties of pediatrics is still forthcoming with the creation of a common data model and amalgamation of the huge "omics" database. The physicians entrusted with the care of children must be aware of the outcome so that they can play a role to ensure that big data algorithms have a clinically relevant effect in improving the health of their patients. They will apply the outcome of big data and its analysis in patient care through clinical algorithms or with the help of embedded clinical support alerts from the electronic medical records. IMPACT: Big data in pediatrics include structured, unstructured data, waveform data, biological, and social data. Big data analytics has unraveled significant information from these databases. This is changing how pediatricians will look at the body of available evidence and translate it into their clinical practice. Data harnessed so far is implemented in certain fields while in others it is in the process of development to become a clinical adjunct to the physician. Common databases are being prepared for future work. Diagnostic and prediction models when incorporated into the health system will guide the pediatrician to a targeted approach to diagnosis and therapy.
Collapse
|
5
|
Mäkinen VP, Rehn J, Breen J, Yeung D, White DL. Multi-Cohort Transcriptomic Subtyping of B-Cell Acute Lymphoblastic Leukemia. Int J Mol Sci 2022; 23:ijms23094574. [PMID: 35562965 PMCID: PMC9099612 DOI: 10.3390/ijms23094574] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/13/2022] [Accepted: 04/19/2022] [Indexed: 11/17/2022] Open
Abstract
RNA sequencing provides a snapshot of the functional consequences of genomic lesions that drive acute lymphoblastic leukemia (ALL). The aims of this study were to elucidate diagnostic associations (via machine learning) between mRNA-seq profiles, independently verify ALL lesions and develop easy-to-interpret transcriptome-wide biomarkers for ALL subtyping in the clinical setting. A training dataset of 1279 ALL patients from six North American cohorts was used for developing machine learning models. Results were validated in 767 patients from Australia with a quality control dataset across 31 tissues from 1160 non-ALL donors. A novel batch correction method was introduced and applied to adjust for cohort differences. Out of 18,503 genes with usable expression, 11,830 (64%) were confounded by cohort effects and excluded. Six ALL subtypes (ETV6::RUNX1, KMT2A, DUX4, PAX5 P80R, TCF3::PBX1, ZNF384) that covered 32% of patients were robustly detected by mRNA-seq (positive predictive value ≥ 87%). Five other frequent subtypes (CRLF2, hypodiploid, hyperdiploid, PAX5 alterations and Ph-positive) were distinguishable in 40% of patients at lower accuracy (52% ≤ positive predictive value ≤ 73%). Based on these findings, we introduce the Allspice R package to predict ALL subtypes and driver genes from unadjusted mRNA-seq read counts as encountered in real-world settings. Two examples of Allspice applied to previously unseen ALL patient samples with atypical lesions are included.
Collapse
Affiliation(s)
- Ville-Petteri Mäkinen
- Computational and Systems Biology Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
- Australian Centre for Precision Health, UniSA Clinical & Health Sciences, University of South Australia, Adelaide, SA 5000, Australia
- Computational Medicine, Faculty of Medicine, University of Oulu, FI-90014 Oulu, Finland
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, FI-90014 Oulu, Finland
- Correspondence: ; Tel.: +61-8-8128-4054
| | - Jacqueline Rehn
- Blood Cancer Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia; (J.R.); (D.Y.); (D.L.W.)
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia;
| | - James Breen
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia;
- South Australian Genomics Centre, South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - David Yeung
- Blood Cancer Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia; (J.R.); (D.Y.); (D.L.W.)
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia;
- Australian and New Zealand Children’s Oncology Group, Clayton, VIC 3168, Australia
- Department of Haematology, Royal Adelaide Hospital and SA Pathology, Adelaide, SA 5000, Australia
| | - Deborah L. White
- Blood Cancer Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia; (J.R.); (D.Y.); (D.L.W.)
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia;
- Australian and New Zealand Children’s Oncology Group, Clayton, VIC 3168, Australia
- Faculty of Sciences, University of Adelaide, Adelaide, SA 5005, Australia
- Australian Genomics Health Alliance, Parkville, VIC 3052, Australia
| |
Collapse
|
6
|
Ramesh S, Chokkara S, Shen T, Major A, Volchenboum SL, Mayampurath A, Applebaum MA. Applications of Artificial Intelligence in Pediatric Oncology: A Systematic Review. JCO Clin Cancer Inform 2021; 5:1208-1219. [PMID: 34910588 DOI: 10.1200/cci.21.00102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
PURPOSE There is a need for an improved understanding of clinical and biologic risk factors in pediatric cancer to improve patient outcomes. Machine learning (ML) represents the application of computational inference from advanced statistical methods that can be applied to increasing amount of data available for study in pediatric oncology. The goal of this systematic review was to systematically characterize the state of ML in pediatric oncology and highlight advances and opportunities in the field. METHODS We conducted a systematic review of the Embase, Scopus, and MEDLINE databases for applications of ML in pediatric oncology. Query results from all three databases were aggregated and duplicate studies were removed. RESULTS A total of 42 unique articles that examined the applications of ML in pediatric oncology met inclusion criteria for review. We identified 20 studies of CNS tumors, 13 of solid tumors, and nine of leukemia. ML tasks included classification, prediction of treatment response, and dose optimization with a variety of methods being used including neural network, k-nearest neighbor, random forest, naive Bayes, and support vector machines. Strengths of the identified studies included matching or outperforming physician comparators via automated analysis and predicting therapeutic response. Common limitations included significant heterogeneity in reporting standards, clinical applicability, small sample sizes, and missing external validation cohorts. CONCLUSION We identified areas where ML can enhance clinical care in ways that may not otherwise be achievable. Although ML promises enormous potential in improving diagnostics, decision making, and monitoring for children with cancer, the field remains in early stages and future work will be aided by standards and guidelines to ensure rigorous methodologic design and maximizing clinical utility.
Collapse
Affiliation(s)
- Siddhi Ramesh
- Pritzker School of Medicine, University of Chicago, Chicago, IL
| | - Sukarn Chokkara
- Pritzker School of Medicine, University of Chicago, Chicago, IL
| | - Timothy Shen
- Pritzker School of Medicine, University of Chicago, Chicago, IL
| | - Ajay Major
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, IL
| | - Samuel L Volchenboum
- Department of Pediatrics, Section of Hematology/Oncology, University of Chicago, Chicago, IL
| | - Anoop Mayampurath
- Department of Pediatrics, Section of Hematology/Oncology, University of Chicago, Chicago, IL
| | - Mark A Applebaum
- Department of Pediatrics, Section of Hematology/Oncology, University of Chicago, Chicago, IL
| |
Collapse
|