1
|
Cohen S, Rokach L, Veksler-Lublinsky I. GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions. BMC Bioinformatics 2025; 26:131. [PMID: 40399818 PMCID: PMC12093732 DOI: 10.1186/s12859-025-06153-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 04/30/2025] [Indexed: 05/23/2025] Open
Abstract
Bacterial small RNAs (sRNAs) are pivotal in post-transcriptional regulation, affecting functions like virulence, metabolism, and gene expression by binding specific mRNA targets. Identifying these targets is crucial to understanding sRNA regulation across species. Despite advancements in high-throughput (HT) experimental methods, they remain technically challenging and are limited to detecting sRNA-target interactions under specific environmental conditions. Therefore, computational approaches, especially machine learning (ML), are essential for identifying strong candidates for biological validation. In this paper, we hypothesize that ML models trained on large-scale interaction data from specific conditions can accurately predict new interactions in unseen conditions within the same bacterial strain. To test this, we developed models from two families: (1) graph neural networks (GNNs), including GraphRNA and kGraphRNA, that learn transformed representations of interacting sRNA-mRNA pairs via graph relationships, and (2) decision forests, sInterRF (Random Forest) and sInterXGB (XGBoost), which use various interaction features for prediction. We also proposed Summation Ensemble Models (SEM) that combine scores from multiple models. Across three seen-to-unseen conditions evaluations, our models -particularly kGraphRNA- significantly improved the area under the ROC curve (AUC) and Precision-Recall curve (PR-AUC) compared to sRNARFTarget, CopraRNA, and RNAup. The SEM model combining GraphRNA and CopraRNA outperformed CopraRNA alone on a low-throughput (LT) interactions test set (HT-to-LT evaluation). Beyond enhanced performance, our models enable target prediction for species-specific sRNAs, a capability lacking in some existing tools. Furthermore, GNN models remove the dependency on external tools like RNAplex or RNAup to compute hybridization duplex or energy features, enhancing scalability and runtime efficiency. While this study focuses on E. coli K12 MG1655 interactions, our methods are fully adaptable to predict interactions in other bacterial strains, given sufficient data for training. Our comprehensive feature importance analysis revealed the complexity of sRNA-mRNA interactions across environmental conditions, underscoring the significance of RNA sequence composition and duplex structure characteristics, like base pairing and energy factors; findings that align with biological evidence from previous studies. As HT experiments expand sRNA-target interaction data across conditions in various bacteria, our ML methods with features analysis offer promising advances in sRNA-target prediction and deeper insights into sRNA regulatory mechanisms across diverse species.
Collapse
Affiliation(s)
- Shani Cohen
- Department of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the Negev, 8410501, Beer-Sheva, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the Negev, 8410501, Beer-Sheva, Israel
| | - Isana Veksler-Lublinsky
- Department of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the Negev, 8410501, Beer-Sheva, Israel.
| |
Collapse
|
2
|
Altaf A, Kawashima J, Khalil M, Stecko H, Rashid Z, Kalady M, Pawlik TM. Identification of a gene signature and prediction of overall survival of patients with stage IV colorectal cancer using a novel machine learning approach. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2025; 51:109718. [PMID: 39987816 DOI: 10.1016/j.ejso.2025.109718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 02/19/2025] [Indexed: 02/25/2025]
Abstract
OBJECTIVE We sought to characterize unique gene signature patterns associated with worse overall survival (OS) among patients with stage IV colorectal cancer (CRC) using a machine learning (ML) approach. METHODS Data from the AACR GENIE registry were analyzed for genetic variations (somatic mutations, structural variants and copy number alterations) among patients with CRC. Adult patients (≥18 years) with histologically confirmed stage IV CRC who underwent next-generation sequencing were included. An eXtreme Gradient Boosting (XGBoost) model was developed to predict OS and the relative importance of different genetic alterations was determined using SHapley Additive exPlanations (SHAP) algorithm. RESULTS Among 688 patients with stage IV CRC, 54.4 % were male (n = 374) with a median age of 55 years (IQR, 46-64). An XGBoost model developed using the 200 most frequent genetic alterations demonstrated good performance to predict OS with a c-index of 0.701 (95 % CI: 0.675-0.726) on 5-fold cross-validation. The model achieved time-dependent AUC of 0.742, 0.757 and 0.793 at 12-, 24- and 36-months, respectively. The SHAP algorithm identified the top 20 genetic alterations most strongly predictive of worse OS among stage IV CRC patients. Based on the 20-gene signature, individuals at high risk had worse 12- and 36-month OS versus low-risk patients (82.6 % vs. 97.1 % and 30.1 % vs. 72.6 %, respectively; p < 0.001). CONCLUSION The XGBoost ML model identified a unique gene signature that accurately risk stratified stage IV CRC patients. ML models that incorporate molecular information represent an opportunity to predict long-term outcomes and potentially identify novel therapeutic targets for stage IV CRC patients.
Collapse
Affiliation(s)
- Abdullah Altaf
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Jun Kawashima
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Mujtaba Khalil
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Hunter Stecko
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Zayed Rashid
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Matthew Kalady
- Department of Surgery, Division of Colorectal Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA
| | - Timothy M Pawlik
- Department of Surgery, Division of Surgical Oncology, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH, USA.
| |
Collapse
|
3
|
Hernández-Monsalves AH, Letelier P, Morales C, Rojas E, Saez MA, Coña N, Díaz J, San Martín A, Garcés P, Espinal-Enriquez J, Guzmán N. A Machine Learning Model for Predicting Intensive Care Unit Admission in Inpatients with COVID-19 Using Clinical Data and Laboratory Biomarkers. Biomedicines 2025; 13:1025. [PMID: 40426855 PMCID: PMC12109434 DOI: 10.3390/biomedicines13051025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2025] [Revised: 04/19/2025] [Accepted: 04/20/2025] [Indexed: 05/29/2025] Open
Abstract
Background: Artificial intelligence tools can help improve the clinical management of patients with severe COVID-19. The aim of this study was to validate a machine learning model to predict admission to the Intensive Care Unit (ICU) in individuals with COVID-19. Methods: A total of 201 hospitalized patients with COVID-19 were included. Sociodemographic and clinical data as well as laboratory biomarker results were obtained from medical records and the clinical laboratory information system. Three machine learning models were generated, trained, and internally validated: logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost). The models were evaluated for sensitivity (Sn), specificity (Sp), area under the curve (AUC), precision (P), SHapley Additive exPlanation (SHAP) values, and the clinical utility of predictive models using decision curve analysis (DCA). Results: The predictive model included the following variables: type 2 diabetes mellitus (T2DM), obesity, absolute neutrophil and basophil counts, the neutrophil-to-lymphocyte ratio (NLR), and D-dimer levels on the day of hospital admission. LR showed an Sn of 0.67, Sp of 0.65, AUC of 0.74, and P of 0.66. RF achieved an Sn of 0.87, Sp of 0.83, AUC of 0.96, and P of 0.85. XGBoost demonstrated an Sn of 0.87, Sp of 0.85, AUC of 0.95, and P of 0.86. Conclusions: Among the evaluated models, XGBoost showed robust predictive performance (Sn = 0.87, Sp = 0.85, AUC = 0.95, P = 0.86) and a favorable net clinical benefit in the decision curve analysis, confirming its suitability for predicting ICU admission in COVID-19 and aiding clinical decision-making.
Collapse
Affiliation(s)
- Alfonso Heriberto Hernández-Monsalves
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Pablo Letelier
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Camilo Morales
- Departamento de Procesos Terapéuticos, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile;
| | - Eduardo Rojas
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Mauricio Alejandro Saez
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Nicolás Coña
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Javiera Díaz
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| | - Andrés San Martín
- Laboratorio Clínico, Hospital Dr. Hernán Henríquez Aravena, Temuco 4780000, Chile;
| | - Paola Garcés
- Centro Médico AlergoInmuno Araucanía, Temuco 4780000, Chile;
| | - Jesús Espinal-Enriquez
- Computational Genomics Department, National Institute of Genomic Medicine, Mexico City 14610, Mexico;
| | - Neftalí Guzmán
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco 4780000, Chile; (A.H.H.-M.); (P.L.); (E.R.); (M.A.S.); (N.C.); (J.D.)
| |
Collapse
|
4
|
Ye Z, Song Y, Zhu M, Zheng F, Qin W, Li X, Wang P, Li Z, Chen K, Li A. Assessing the prognostic and therapeutic value of cuproptosis-related genes in colon adenocarcinoma patients. Front Cell Dev Biol 2025; 13:1550982. [PMID: 40276654 PMCID: PMC12018357 DOI: 10.3389/fcell.2025.1550982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 03/28/2025] [Indexed: 04/26/2025] Open
Abstract
Background Colon adenocarcinoma (COAD) remains a major global health challenge with poor prognosis despite advances in treatment, underscoring the need for new biomarkers. As a novel mode of cell death, cuproptosis is thought to be potentially involved in the development of cancer. However, the particularly as the role of cuproptosis-related genes (CRGs) in COAD prognosis and therapy remains unclear. Methods We analyzed RNA sequencing data from The Cancer Genome Atlas for COAD, focusing on CRG expression patterns and their clinicopathological correlations. Using the Weighted Gene Co-expression Network Analysis (WGCNA) method, we identified the gene module most strongly linked to cuproptosis and conducted functional enrichment analysis to explore the roles of genes within this module in COAD tumorigenesis. A novel prognostic risk model based on four CRGs (ORC1, PTTG1, DLAT, PDHB) was developed to stratify COAD patients into high-risk and low-risk groups, assessing overall survival, tumor microenvironment, and mutational landscape differences. We also evaluated the therapeutic effects of ferredoxin 1 (FDX1) and elesclomol in promoting cuproptosis in HCT116 and LoVo cell lines through various experiments, including cell proliferation, apoptosis assessment, mitochondrial membrane potential evaluation, and DLAT lipoylation detection via Western blot. Results Certain CRGs showed different expressions in COAD versus normal tissues. WGCNA identified a gene module linked to cuproptosis, crucial for pathways like cell cycle regulation, citrate cycle (TCA cycle), and DNA replication. The novel risk model stratified patients into high and low-risk groups based on risk scores, revealing that high-risk COAD patients had shorter overall survival and distinct immune cell infiltration, while low-risk patients were more sensitive to immunotherapy. Experimental results indicated that FDX1 exerted an inhibitory effect on COAD, and its combination with elesclomol significantly reduced proliferation, promoted apoptosis, increased DLAT lipoylation, and lowered mitochondrial membrane potential in COAD cells. Conclusion The findings of this study provided a new perspective for the research on biomarkers and therapeutic strategies in COAD, evaluated the prognostic and therapeutic value of CRGs in COAD patients, and laid a theoretical foundation for the future clinical application of CRGs.
Collapse
Affiliation(s)
- Zhanhui Ye
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yixian Song
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Mengqing Zhu
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Fuying Zheng
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Wenjie Qin
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
- Endoscopy Center, Jiangmen Central Hospital, Jiangmen, China
| | - Xue Li
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Pei Wang
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Zihua Li
- Department of Orthopedics, Shanghai Tongji Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Kequan Chen
- Department of Gastroenterology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Aimin Li
- Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
5
|
Xu Y, Yang W, Qiu J, Zhou K, Yu G, Zhang Y, Wang X, Jiao Y, Wang X, Hu S, Zhang X, Li P, Lu Y, Chen R, Tao T, Yang Z, Xu Y, Xu C. Metabolic marker-assisted genomic prediction improves hybrid breeding. PLANT COMMUNICATIONS 2025; 6:101199. [PMID: 39614617 PMCID: PMC11956108 DOI: 10.1016/j.xplc.2024.101199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/31/2024] [Accepted: 11/26/2024] [Indexed: 12/01/2024]
Abstract
Hybrid breeding is widely acknowledged as the most effective method for increasing crop yield, particularly in maize and rice. However, a major challenge in hybrid breeding is the selection of desirable combinations from the vast pool of potential crosses. Genomic selection (GS) has emerged as a powerful tool to tackle this challenge, but its success in practical breeding depends on prediction accuracy. Several strategies have been explored to enhance prediction accuracy for complex traits, such as the incorporation of functional markers and multi-omics data. Metabolome-wide association studies (MWAS) help to identify metabolites that are closely linked to phenotypes, known as metabolic markers. However, the use of preselected metabolic markers from parental lines to predict hybrid performance has not yet been explored. In this study, we developed a novel approach called metabolic marker-assisted genomic prediction (MM_GP), which incorporates significant metabolites identified from MWAS into GS models to improve the accuracy of genomic hybrid prediction. In maize and rice hybrid populations, MM_GP outperformed genomic prediction (GP) for all traits, regardless of the method used (genomic best linear unbiased prediction or eXtreme gradient boosting). On average, MM_GP demonstrated 4.6% and 13.6% higher predictive abilities than GP for maize and rice, respectively. MM_GP could also match or even surpass the predictive ability of M_GP (integrated genomic-metabolomic prediction) for most traits. In maize, the integration of only six metabolic markers significantly associated with multiple traits resulted in 5.0% and 3.1% higher average predictive ability compared with GP and M_GP, respectively. With advances in high-throughput metabolomics technologies and prediction models, this approach holds great promise for revolutionizing genomic hybrid breeding by enhancing its accuracy and efficiency.
Collapse
Affiliation(s)
- Yang Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Wenyan Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Kai Zhou
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Guangning Yu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yuxiang Zhang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xin Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yuxin Jiao
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xinyi Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Shujun Hu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT), Mexico D.F. 06600, Mexico
| | - Pengcheng Li
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yue Lu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Rujia Chen
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Tianyun Tao
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Zefeng Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yunbi Xu
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China; BGI Bioverse, Shenzhen 518083, China; MolBreeding Biotechnology Co., Ltd., Shijiazhuang 050035, China.
| | - Chenwu Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China.
| |
Collapse
|
6
|
Andrew TW, Alrawi M, Plummer R, Reynolds N, Sondak V, Brownell I, Lovat PE, Rose A, Shalhout SZ. A hybrid machine learning approach for the personalized prognostication of aggressive skin cancers. NPJ Digit Med 2025; 8:15. [PMID: 39779875 PMCID: PMC11711377 DOI: 10.1038/s41746-024-01329-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 11/05/2024] [Indexed: 01/11/2025] Open
Abstract
Accurate prognostication guides optimal clinical management in skin cancer. Merkel cell carcinoma (MCC) is the most aggressive form of skin cancer that often presents in advanced stages and is associated with poor survival rates. There are no personalized prognostic tools in use in MCC. We employed explainability analysis to reveal new insights into mortality risk factors for this highly aggressive cancer. We then combined deep learning feature selection with a modified XGBoost framework, to develop a web-based prognostic tool for MCC termed 'DeepMerkel'. DeepMerkel can make accurate personalised, time-dependent survival predictions for MCC from readily available clinical information. It demonstrated generalizability through high predictive performance in an international clinical cohort, out-performing current population-based prognostic staging systems. MCC and DeepMerkel provide the exemplar model of personalised machine learning prognostic tools in aggressive skin cancers.
Collapse
Affiliation(s)
- Tom W Andrew
- Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK.
- Department of Plastic and Reconstructive Surgery, Royal Victoria Infirmary, Newcastle Upon Tyne Hospital NHS Foundation Trust (NuTH), Newcastle upon Tyne, UK.
| | - Mogdad Alrawi
- Department of Plastic and Reconstructive Surgery, Royal Victoria Infirmary, Newcastle Upon Tyne Hospital NHS Foundation Trust (NuTH), Newcastle upon Tyne, UK
| | - Ruth Plummer
- Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Department of Oncology, Newcastle University and Northern Centre for Cancer Care, Newcastle upon Tyne, UK
| | - Nick Reynolds
- Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre & Department of Dermatology, Royal Victoria Infirmary, Newcastle Upon Tyne Hospital NHS Foundation Trust (NuTH), Newcastle upon Tyne, UK
| | - Vern Sondak
- Department of Cutaneous Oncology, Moffitt Cancer Center, and Department of Oncologic Sciences, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Isaac Brownell
- Dermatology Branch, National Institute of Arthritis Musculoskeletal and Skin Diseases (NIAMS), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Penny E Lovat
- Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Aidan Rose
- Translation and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Department of Plastic and Reconstructive Surgery, Royal Victoria Infirmary, Newcastle Upon Tyne Hospital NHS Foundation Trust (NuTH), Newcastle upon Tyne, UK
| | - Sophia Z Shalhout
- Mike Toth Head and Neck Cancer Research Center, Division of Surgical Oncology, Department of Otolaryngology-Head and Neck Surgery, Mass Eye and Ear, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Su Y, Huang C, Yang C, Lin Q, Chen Z. Prediction of Survival in Patients With Esophageal Cancer After Immunotherapy Based on Small-Size Follow-Up Data. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2024; 5:769-782. [PMID: 39464488 PMCID: PMC11505867 DOI: 10.1109/ojemb.2024.3452983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/05/2024] [Accepted: 08/26/2024] [Indexed: 10/29/2024] Open
Abstract
Esophageal cancer (EC) poses a significant health concern, particularly among the elderly, warranting effective treatment strategies. While immunotherapy holds promise in activating the immune response against tumors, its specific impact and associated reactions in EC patients remain uncertain. Precise prognosis prediction becomes crucial for guiding appropriate interventions. This study, based on data from the First Affiliated Hospital of Xiamen University (January 2017 to May 2021), focuses on 113 EC patients undergoing immunotherapy. The primary objectives are to elucidate the effectiveness of immunotherapy in EC treatment and to introduce a stacking ensemble learning method for predicting the survival of EC patients who have undergone immunotherapy, in the context of small sample sizes, addressing the imperative of supporting clinical decision-making for healthcare professionals. Our method incorporates five sub-learners and one meta-learner. Leveraging optimal features from the training dataset, this approach achieved compelling accuracy (89.13%) and AUC (88.83%) in predicting three-year survival status, surpassing conventional techniques. The model proves efficient in guiding clinical decisions, especially in scenarios with small-size follow-up data.
Collapse
Affiliation(s)
- Yuhan Su
- School of Electronic Science and EngineeringXiamen UniversityXiamen361005China
- Shenzhen Research Institute of Xiamen UniversityShenzhen518057China
| | - Chaofeng Huang
- Institute of Artificial IntelligenceXiamen UniversityXiamen361005China
| | - Chen Yang
- First Affiliated Hospital of Xiamen UniversityXiamen361000China
| | - Qin Lin
- First Affiliated Hospital of Xiamen UniversityXiamen361000China
| | - Zhong Chen
- School of Electronic Science and EngineeringXiamen UniversityXiamen361005China
- Institute of Artificial IntelligenceXiamen UniversityXiamen361005China
| |
Collapse
|
8
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
9
|
Pan L, Peng Y, Li Y, Wang X, Liu W, Xu L, Liang Q, Peng S. SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival. Comput Biol Med 2024; 172:108301. [PMID: 38492453 DOI: 10.1016/j.compbiomed.2024.108301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/03/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at https://github.com/panliangrui/Selector.
Collapse
Affiliation(s)
- Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Yijun Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Yan Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Xiang Wang
- Department of Thoracic Surgery, The second xiangya hospital, Central South University, Changsha, 410011, Hunan, China.
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Qingchun Liang
- Department of Pathology, The second xiangya hospital, Central South University, Changsha, 410011, Hunan, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| |
Collapse
|
10
|
Tao S, Ravindranath R, Wang SY. Predicting Glaucoma Progression to Surgery with Artificial Intelligence Survival Models. OPHTHALMOLOGY SCIENCE 2023; 3:100336. [PMID: 37415920 PMCID: PMC10320266 DOI: 10.1016/j.xops.2023.100336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 07/08/2023]
Abstract
Purpose Prior artificial intelligence (AI) models for predicting glaucoma progression have used traditional classifiers that do not consider the longitudinal nature of patients' follow-up. In this study, we developed survival-based AI models for predicting glaucoma patients' progression to surgery, comparing performance of regression-, tree-, and deep learning-based approaches. Design Retrospective observational study. Subjects Patients with glaucoma seen at a single academic center from 2008 to 2020 identified from electronic health records (EHRs). Methods From the EHRs, we identified 361 baseline features, including demographics, eye examinations, diagnoses, and medications. We trained AI survival models to predict patients' progression to glaucoma surgery using the following: (1) a penalized Cox proportional hazards (CPH) model with principal component analysis (PCA); (2) random survival forests (RSFs); (3) gradient-boosting survival (GBS); and (4) a deep learning model (DeepSurv). The concordance index (C-index) and mean cumulative/dynamic area under the curve (mean AUC) were used to evaluate model performance on a held-out test set. Explainability was investigated using Shapley values for feature importance and visualization of model-predicted cumulative hazard curves for patients with different treatment trajectories. Main Outcome Measures Progression to glaucoma surgery. Results Of the 4512 patients with glaucoma, 748 underwent glaucoma surgery, with a median follow-up of 1038 days. The DeepSurv model performed best overall (C-index, 0.775; mean AUC, 0.802) among the models studied in this article (CPH with PCA: C-index, 0.745; mean AUC, 0.780; RSF: C-index, 0.766; mean AUC, 0.804; GBS: C-index, 0.764; mean AUC, 0.791). Predicted cumulative hazard curves demonstrate how models could distinguish between patient who underwent early surgery and patients who underwent surgery after > 3000 days of follow-up or no surgery. Conclusions Artificial intelligence survival models can predict progression to glaucoma surgery using structured data from EHRs. Tree-based and deep learning-based models performed better at predicting glaucoma progression to surgery than the CPH regression model, potentially because of their better suitability for high-dimensional data sets. Future work predicting ophthalmic outcomes should consider using tree-based and deep learning-based survival AI models. Additional research is needed to develop and evaluate more sophisticated deep learning survival models that can incorporate clinical notes or imaging. Financial Disclosures Proprietary or commercial disclosure may be found after the references.
Collapse
Affiliation(s)
- Shiqi Tao
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| | - Rohith Ravindranath
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| | - Sophia Y. Wang
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| |
Collapse
|
11
|
Guan SW, Lin Q, Wu XD, Yu HB. Weighted gene coexpression network analysis and machine learning reveal oncogenome associated microbiome plays an important role in tumor immunity and prognosis in pan-cancer. J Transl Med 2023; 21:537. [PMID: 37573394 PMCID: PMC10422781 DOI: 10.1186/s12967-023-04411-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/02/2023] [Indexed: 08/14/2023] Open
Abstract
BACKGROUND For many years, the role of the microbiome in tumor progression, particularly the tumor microbiome, was largely overlooked. The connection between the tumor microbiome and the tumor genome still requires further investigation. METHODS The TCGA microbiome and genome data were obtained from Haziza et al.'s article and UCSC Xena database, respectively. Separate WGCNA networks were constructed for the tumor microbiome and genomic data after filtering the datasets. Correlation analysis between the microbial and mRNA modules was conducted to identify oncogenome associated microbiome module (OAM) modules, with three microbial modules selected for each tumor type. Reactome analysis was used to enrich biological processes. Machine learning techniques were implemented to explore the tumor type-specific enrichment and prognostic value of OAM, as well as the ability of the tumor microbiome to differentiate TP53 mutations. RESULTS We constructed a total of 182 tumor microbiome and 570 mRNA WGCNA modules. Our results show that there is a correlation between tumor microbiome and tumor genome. Gene enrichment analysis results suggest that the genes in the mRNA module with the highest correlation with the tumor microbiome group are mainly enriched in infection, transcriptional regulation by TP53 and antigen presentation. The correlation analysis of OAM with CD8+ T cells or TAM1 cells suggests the existence of many microbiota that may be involved in tumor immune suppression or promotion, such as Williamsia in breast cancer, Biostraticola in stomach cancer, Megasphaera in cervical cancer and Lottiidibacillus in ovarian cancer. In addition, the results show that the microbiome-genome prognostic model has good predictive value for short-term prognosis. The analysis of tumor TP53 mutations shows that tumor microbiota has a certain ability to distinguish TP53 mutations, with an AUROC value of 0.755. The tumor microbiota with high importance scores are Corallococcus, Bacillus and Saezia. Finally, we identified a potential anti-cancer microbiota, Tissierella, which has been shown to be associated with improved prognosis in tumors including breast cancer, lung adenocarcinoma and gastric cancer. CONCLUSION There is an association between the tumor microbiome and the tumor genome, and the existence of this association is not accidental and could change the landscape of tumor research.
Collapse
Affiliation(s)
- Shi-Wei Guan
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, People's Republic of China
| | - Quan Lin
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, People's Republic of China
| | - Xi-Dong Wu
- Department of Neurosurgery Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, People's Republic of China
| | - Hai-Bo Yu
- Department of Hepatobiliary Surgery, Wenzhou Central Hospital, The Dingli Clinical Institute of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, People's Republic of China.
| |
Collapse
|
12
|
Zou H, Lu Z, Weng W, Yang L, Yang L, Leng X, Wang J, Lin YF, Wu J, Fu L, Zhang X, Li Y, Wang L, Wu X, Zhou X, Tian T, Huang L, Marra CM, Yang B, Yang TC, Ke W. Diagnosis of neurosyphilis in HIV-negative patients with syphilis: development, validation, and clinical utility of a suite of machine learning models. EClinicalMedicine 2023; 62:102080. [PMID: 37533423 PMCID: PMC10393556 DOI: 10.1016/j.eclinm.2023.102080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/19/2023] [Accepted: 06/19/2023] [Indexed: 08/04/2023] Open
Abstract
Background The ability to accurately identify the absolute risk of neurosyphilis diagnosis for patients with syphilis would allow preventative and therapeutic interventions to be delivered to patients at high-risk, sparing patients at low-risk from unnecessary care. We aimed to develop, validate, and evaluate the clinical utility of simplified clinical diagnostic models for neurosyphilis diagnosis in HIV-negative patients with syphilis. Methods We searched PubMed, China National Knowledge Infrastructure and UpToDate for publications about neurosyphilis diagnostic guidelines in English or Chinese from database inception until March 15, 2023. We developed and validated machine learning models with a uniform set of predictors based on six authoritative diagnostic guidelines across four continents to predict neurosyphilis using routinely collected data from real-world clinical practice in China and the United States (through the Dermatology Hospital of Southern Medical University in Guangzhou [659 recruited between August 2012 and March 2022, treated as Development cohort], the Beijing Youan Hospital of Capital Medical University in Beijng [480 recruited between December 2013 and April 2021, treated as External cohort 1], the Zhongshan Hospital of Xiamen University in Xiamen [493 recruited between November 2005 and November 2021, treated as External cohort 2] from China, and University of Washington School of Medicine in Seattle [16 recruited between September 2002 and April 2014, treated as External cohort 3] from United States). We included all these patients with syphilis into our analysis, and no patients were further excluded. We trained eXtreme gradient boosting (XGBoost) models to predict the diagnostic outcome of neurosyphilis according to each diagnostic guideline in two scenarios, respectively. Model performance was measured through both internal and external validation in terms of discrimination and calibration, and clinical utility was evaluated using decision curve analysis. Findings The final simplified clinical diagnostic models included neurological symptoms, cerebrospinal fluid (CSF) protein, CSF white blood cell, and CSF venereal disease research laboratory test/rapid plasma reagin. The models showed good calibration with rescaled Brier score of 0.99 (95% CI 0.98-1.00) and excellent discrimination (the minimum value of area under the receiver operating characteristic curve, 0.84; 95% CI 0.81-0.88) when externally validated. Decision curve analysis demonstrated that the models were useful across a range of neurosyphilis probability thresholds between 0.33 and 0.66 compared to the alternatives of managing all patients with syphilis as if they do or do not have neurosyphilis. Interpretation The simplified clinical diagnostic models comprised of readily available data show good performance, are generalisable across clinical settings, and have clinical utility over a broad range of probability thresholds. The models with a uniform set of predictors can simplify the sophisticated clinical diagnosis of neurosyphilis, and guide decisions on delivery of neurosyphilis health-care, ultimately, support accurate diagnosis and necessary treatment. Funding The Natural Science Foundation of China General Program, Health Appropriate Technology Promotion Project of Guangdong Medical Research Foundation, Department of Science and technology of Guangdong Province Xinjiang Rural Science and Technology(Special Commissioner)Project, Southern Medical University Clinical Research Nursery Garden Project, Beijing Municipal Administration of Hospitals Incubating Program.
Collapse
Affiliation(s)
- Huachun Zou
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Zhen Lu
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Wenjia Weng
- Department of Dermatology, Beijing Youan Hospital, Capital Medical University, Beijing, 100012, China
| | - Ligang Yang
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Luoyao Yang
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Xinying Leng
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Junfeng Wang
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
- Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, the Netherlands
| | - Yi-Fan Lin
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Jiaxin Wu
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Leiwen Fu
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Xiaohui Zhang
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Yuwei Li
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Liuyuan Wang
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Xinsheng Wu
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Xinyi Zhou
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Tian Tian
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, 518107, China
| | - Lixia Huang
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Christina M. Marra
- Department of Neurology, University of Washington, Seattle, WA, 98104, USA
| | - Bin Yang
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| | - Tian-Ci Yang
- Center of Clinical Laboratory, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361004, China
- Institute of Infectious Disease, School of Medicine, Xiamen University, Xiamen, 361004, China
| | - Wujian Ke
- Department of STD Clinic, Dermatology Hospital of Southern Medical University, Guangzhou, 510091, China
| |
Collapse
|
13
|
Duan M, Wang Y, Zhao D, Liu H, Zhang G, Li K, Zhang H, Huang L, Zhang R, Zhou F. Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 2023; 24:bbad238. [PMID: 37427963 DOI: 10.1093/bib/bbad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 05/29/2023] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Dong Zhao
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Hongmei Liu
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Gongyou Zhang
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Kewei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Haotian Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China, 130012
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| |
Collapse
|
14
|
Xiang T, Li T, Li J, Li X, Wang J. Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs. FASEB J 2023; 37:e22961. [PMID: 37178007 DOI: 10.1096/fj.202300245r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/30/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023]
Abstract
Genomic prediction, which is based on solving linear mixed-model (LMM) equations, is the most popular method for predicting breeding values or phenotypic performance for economic traits in livestock. With the need to further improve the performance of genomic prediction, nonlinear methods have been considered as an alternative and promising approach. The excellent ability to predict phenotypes in animal husbandry has been demonstrated by machine learning (ML) approaches, which have been rapidly developed. To investigate the feasibility and reliability of implementing genomic prediction using nonlinear models, the performances of genomic predictions for pig productive traits using the linear genomic selection model and nonlinear machine learning models were compared. Then, to reduce the high-dimensional features of genome sequence data, different machine learning algorithms, including the random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and convolutional neural network (CNN) algorithms, were used to perform genomic feature selection as well as genomic prediction on reduced feature genome data. All of the analyses were processed on two real pig datasets: the published PIC pig dataset and a dataset comprising data from a national pig nucleus herd in Chifeng, North China. Overall, the accuracies of predicted phenotypic performance for traits T1, T2, T3 and T5 in the PIC dataset and average daily gain (ADG) in the Chifeng dataset were higher using the ML methods than the LMM method, while those for trait T4 in the PIC dataset and total number of piglets born (TNB) in the Chifeng dataset were slightly lower using the ML methods than the LMM method. Among all the different ML algorithms, SVM was the most appropriate for genomic prediction. For the genomic feature selection experiment, the most stable and most accurate results across different algorithms were achieved using XGBoost in combination with the SVM algorithm. Through feature selection, the number of genomic markers can be reduced to 1 in 20, while the predictive performance on some traits can even be improved compared to using the full genome data. Finally, we developed a new tool that can be used to execute combined XGBoost and SVM algorithms to realize genomic feature selection and phenotypic prediction.
Collapse
Affiliation(s)
- Tao Xiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, Huazhong Agricultural University, Wuhan, China
| | - Tao Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Jielin Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, Huazhong Agricultural University, Wuhan, China
| | - Xin Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Jia Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
15
|
A Combined Risk Score Model to Assess Prognostic Value in Patients with Soft Tissue Sarcomas. Cells 2022; 11:cells11244077. [PMID: 36552841 PMCID: PMC9776565 DOI: 10.3390/cells11244077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/13/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
A study by Tsvetkov et al. recently published a proposed novel form of copper-induced cell death in Science; however, few studies have looked into the possible mechanism in soft tissue sarcoma (STS). Herein, this study sought to investigate the function of cuproptosis-related genes (CRGs) in the development of tumor-associated immune cells and the prognosis of sarcoma. Herein, this study aimed to explore the role of cuproptosis-related genes (CRGs) in the development, tumor-associated immune cells, and the prognosis of sarcoma. METHODS The prognostic model was established via the least absolute shrinkage and selection operator (LASSO) algorithm as well as multivariate Cox regression analysis. The stromal scores, immune scores, ESTIMA scores, and tumor purity of sarcoma patients were evaluated by the ESTIMATE algorithm. Functional analyses were performed to investigate the underlying mechanisms of immune cell infiltration and the prognosis of CRGs in sarcoma. RESULTS Two molecular subgroups with different CRG expression patterns were recognized, which showed that patients with a higher immune score and more active immune status were prone to have better prognostic survival. Moreover, GO and KEGG analyses showed that these differentially expressed CRGs were mainly enriched in metabolic/ions-related signaling pathways, indicating that CRGs may have impacts on the immune cell infiltration and prognosis of sarcoma via regulating the bioprocess of mitochondria and consequently affecting the immune microenvironment. The expression levels of CRGs were closely correlated to the immunity condition and prognostic survival of sarcoma patients. CONCLUSIONS The interaction between cuproptosis and immunity in sarcoma may provide a novel insight into the study of molecular mechanisms and candidate biomarkers for the prognosis, resulting in effective treatments for sarcoma patients.
Collapse
|
16
|
Qin X, Yin D, Dong X, Chen D, Zhang S. Survival prediction model for right-censored data based on improved composite quantile regression neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:7521-7542. [PMID: 35801434 DOI: 10.3934/mbe.2022354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of the field of survival analysis, statistical inference of right-censored data is of great importance for the study of medical diagnosis. In this study, a right-censored data survival prediction model based on an improved composite quantile regression neural network framework, called rcICQRNN, is proposed. It incorporates composite quantile regression with the loss function of a multi-hidden layer feedforward neural network, combined with an inverse probability weighting method for survival prediction. Meanwhile, the hyperparameters involved in the neural network are adjusted using the WOA algorithm, integer encoding and One-Hot encoding are implemented to encode the classification features, and the BWOA variable selection method for high-dimensional data is proposed. The rcICQRNN algorithm was tested on a simulated dataset and two real breast cancer datasets, and the performance of the model was evaluated by three evaluation metrics. The results show that the rcICQRNN-5 model is more suitable for analyzing simulated datasets. The One-Hot encoding of the WOA-rcICQRNN-30 model is more applicable to the NKI70 data. The model results are optimal for k=15 after feature selection for the METABRIC dataset. Finally, we implemented the method for cross-dataset validation. On the whole, the Cindex results using One-Hot encoding data are more stable, making the proposed rcICQRNN prediction model flexible enough to assist in medical decision making. It has practical applications in areas such as biomedicine, insurance actuarial and financial economics.
Collapse
Affiliation(s)
- Xiwen Qin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongmei Yin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Xiaogang Dong
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongxue Chen
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Shuang Zhang
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| |
Collapse
|
17
|
Dong Y, Zhou S, Xing L, Chen Y, Ren Z, Dong Y, Zhang X. Deep learning methods may not outperform other machine learning methods on analyzing genomic studies. Front Genet 2022; 13:992070. [PMID: 36212148 PMCID: PMC9537734 DOI: 10.3389/fgene.2022.992070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 08/04/2022] [Indexed: 12/03/2022] Open
Abstract
Deep Learning (DL) has been broadly applied to solve big data problems in biomedical fields, which is most successful in image processing. Recently, many DL methods have been applied to analyze genomic studies. However, genomic data usually has too small a sample size to fit a complex network. They do not have common structural patterns like images to utilize pre-trained networks or take advantage of convolution layers. The concern of overusing DL methods motivates us to evaluate DL methods' performance versus popular non-deep Machine Learning (ML) methods for analyzing genomic data with a wide range of sample sizes. In this paper, we conduct a benchmark study using the UK Biobank data and its many random subsets with different sample sizes. The original UK Biobank data has about 500k participants. Each patient has comprehensive patient characteristics, disease histories, and genomic information, i.e., the genotypes of millions of Single-Nucleotide Polymorphism (SNPs). We are interested in predicting the risk of three lung diseases: asthma, COPD, and lung cancer. There are 205,238 participants have recorded disease outcomes for these three diseases. Five prediction models are investigated in this benchmark study, including three non-deep machine learning methods (Elastic Net, XGBoost, and SVM) and two deep learning methods (DNN and LSTM). Besides the most popular performance metrics, such as the F1-score, we promote the hit curve, a visual tool to describe the performance of predicting rare events. We discovered that DL methods frequently fail to outperform non-deep ML in analyzing genomic data, even in large datasets with over 200k samples. The experiment results suggest not overusing DL methods in genomic studies, even with biobank-level sample sizes. The performance differences between DL and non-deep ML decrease as the sample size of data increases. This suggests when the sample size of data is significant, further increasing sample sizes leads to more performance gain in DL methods. Hence, DL methods could be better if we analyze genomic data bigger than this study.
Collapse
Affiliation(s)
- Yao Dong
- School of Artifcial Intelligence, Hebei University of Technology, Tianjin, China.,Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada.,Hebei Province Key Laboratory of Big Data Computing, Tianjin, China
| | - Shaoze Zhou
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, Saskatoon
| | - Yumeng Chen
- School of Artifcial Intelligence, Hebei University of Technology, Tianjin, China.,Hebei Province Key Laboratory of Big Data Computing, Tianjin, China
| | - Ziyu Ren
- School of Artifcial Intelligence, Hebei University of Technology, Tianjin, China.,Hebei Province Key Laboratory of Big Data Computing, Tianjin, China
| | - Yongfeng Dong
- School of Artifcial Intelligence, Hebei University of Technology, Tianjin, China.,Hebei Province Key Laboratory of Big Data Computing, Tianjin, China
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| |
Collapse
|