1
|
Wei Z, Iyer MR, Zhao B, Deng J, Mitchell CS. Artificial Intelligence-Assisted Comparative Analysis of the Overlapping Molecular Pathophysiology of Alzheimer's Disease, Amyotrophic Lateral Sclerosis, and Frontotemporal Dementia. Int J Mol Sci 2024; 25:13450. [PMID: 39769215 PMCID: PMC11678588 DOI: 10.3390/ijms252413450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 11/27/2024] [Accepted: 12/12/2024] [Indexed: 01/11/2025] Open
Abstract
The overlapping molecular pathophysiology of Alzheimer's Disease (AD), Amyotrophic Lateral Sclerosis (ALS), and Frontotemporal Dementia (FTD) was analyzed using relationships from a knowledge graph of 33+ million biomedical journal articles. The unsupervised learning rank aggregation algorithm from SemNet 2.0 compared the most important amino acid, peptide, and protein (AAPP) nodes connected to AD, ALS, or FTD. FTD shared 99.9% of its nodes with ALS and AD; AD shared 64.2% of its nodes with FTD and ALS; and ALS shared 68.3% of its nodes with AD and FTD. The results were validated and mapped to functional biological processes using supervised human supervision and an external large language model. The overall percentages of mapped intersecting biological processes were as follows: inflammation and immune response, 19%; synapse and neurotransmission, 19%; cell cycle, 15%; protein aggregation, 12%; membrane regulation, 11%; stress response and regulation, 9%; and gene regulation, 4%. Once normalized for node count, biological mappings for cell cycle regulation and stress response were more prominent in the intersection of AD and FTD. Protein aggregation, gene regulation, and energetics were more prominent in the intersection of ALS and FTD. Synapse and neurotransmission, membrane regulation, and inflammation and immune response were greater at the intersection of AD and ALS. Given the extensive molecular pathophysiology overlap, small differences in regulation, genetic, or environmental factors likely shape the underlying expressed disease phenotype. The results help prioritize testable hypotheses for future clinical or experimental research.
Collapse
Affiliation(s)
- Zihan Wei
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology & Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Meghna R. Iyer
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology & Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Benjamin Zhao
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology & Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jennifer Deng
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology & Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology & Emory University School of Medicine, Atlanta, GA 30322, USA
- Center for Machine Learning at Georgia Tech, Atlanta, GA 30332, USA
| |
Collapse
|
2
|
Patidar K, Deng JH, Mitchell CS, Ford Versypt AN. Cross-Domain Text Mining of Pathophysiological Processes Associated with Diabetic Kidney Disease. Int J Mol Sci 2024; 25:4503. [PMID: 38674089 PMCID: PMC11050166 DOI: 10.3390/ijms25084503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study's goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains-DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/β-catenin pathway, the renin-angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein-protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.
Collapse
Affiliation(s)
- Krutika Patidar
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
| | - Jennifer H. Deng
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Center for Machine Learning at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Ashlee N. Ford Versypt
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Institute for Artificial Intelligence and Data Science, University at Buffalo, Buffalo, NY 14260, USA
| |
Collapse
|
3
|
Al-Hussaini I, White B, Varmeziar A, Mehra N, Sanchez M, Lee J, DeGroote NP, Miller TP, Mitchell CS. An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia. J Clin Med 2024; 13:1788. [PMID: 38542012 PMCID: PMC10970787 DOI: 10.3390/jcm13061788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/18/2024] Open
Abstract
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.
Collapse
Affiliation(s)
- Irfan Al-Hussaini
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Brandon White
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Armon Varmeziar
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Nidhi Mehra
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Milagro Sanchez
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Judy Lee
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Nicholas P. DeGroote
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Tamara P. Miller
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|