1
|
Jilani M, Degras D, Haspel N. Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes (Basel) 2024; 15:631. [PMID: 38790260 PMCID: PMC11121157 DOI: 10.3390/genes15050631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
Advancements in the field of next generation sequencing (NGS) have generated vast amounts of data for the same set of subjects. The challenge that arises is how to combine and reconcile results from different omics studies, such as epigenome and transcriptome, to improve the classification of disease subtypes. In this study, we introduce sCClust (sparse canonical correlation analysis with clustering), a technique to combine high-dimensional omics data using sparse canonical correlation analysis (sCCA), such that the correlation between datasets is maximized. This stage is followed by clustering the integrated data in a lower-dimensional space. We apply sCClust to gene expression and DNA methylation data for three cancer genomics datasets from the Cancer Genome Atlas (TCGA) to distinguish between underlying subtypes. We evaluate the identified subtypes using Kaplan-Meier plots and hazard ratio analysis on the three types of cancer-GBM (glioblastoma multiform), lung cancer and colon cancer. Comparison with subtypes identified by both single- and multi-omics studies implies improved clinical association. We also perform pathway over-representation analysis in order to identify up-regulated and down-regulated genes as tentative drug targets. The main goal of the paper is twofold: the integration of epigenomic and transcriptomic datasets followed by elucidating subtypes in the latent space. The significance of this study lies in the enhanced categorization of cancer data, which is crucial to precision medicine.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| | - David Degras
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| |
Collapse
|
2
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
3
|
Xu Z, Liao H, Huang L, Chen Q, Lan W, Li S. IBPGNET: lung adenocarcinoma recurrence prediction based on neural network interpretability. Brief Bioinform 2024; 25:bbae080. [PMID: 38557672 PMCID: PMC10982951 DOI: 10.1093/bib/bbae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 04/04/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer. Early-stage patients have a 30-50% probability of metastatic recurrence after surgical treatment. Here, we propose a new computational framework, Interpretable Biological Pathway Graph Neural Networks (IBPGNET), based on pathway hierarchy relationships to predict LUAD recurrence and explore the internal regulatory mechanisms of LUAD. IBPGNET can integrate different omics data efficiently and provide global interpretability. In addition, our experimental results show that IBPGNET outperforms other classification methods in 5-fold cross-validation. IBPGNET identified PSMC1 and PSMD11 as genes associated with LUAD recurrence, and their expression levels were significantly higher in LUAD cells than in normal cells. The knockdown of PSMC1 and PSMD11 in LUAD cells increased their sensitivity to afatinib and decreased cell migration, invasion and proliferation. In addition, the cells showed significantly lower EGFR expression, indicating that PSMC1 and PSMD11 may mediate therapeutic sensitivity through EGFR expression.
Collapse
Affiliation(s)
- Zhanyu Xu
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Haibo Liao
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Liuliu Huang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Qingfeng Chen
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Wei Lan
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| |
Collapse
|
4
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
5
|
Kaynar G, Cakmakci D, Bund C, Todeschi J, Namer IJ, Cicek AE. PiDeeL: metabolic pathway-informed deep learning model for survival analysis and pathological classification of gliomas. Bioinformatics 2023; 39:btad684. [PMID: 37952175 PMCID: PMC10663986 DOI: 10.1093/bioinformatics/btad684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/19/2023] [Accepted: 11/10/2023] [Indexed: 11/14/2023] Open
Abstract
MOTIVATION Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based tumor pathology prediction, their model complexity predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. RESULTS In this study, we propose a metabolic pathway-informed deep learning model (PiDeeL) to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve by 3.38% and the Area Under the Precision-Recall Curve by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), PiDeeL achieves better survival analysis performance (improvement of 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific neurons of PiDeeL provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. AVAILABILITY AND IMPLEMENTATION The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study are released at https://zenodo.org/record/7228791.
Collapse
Affiliation(s)
- Gun Kaynar
- Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey
| | - Doruk Cakmakci
- School of Computer Science, McGill University, Montreal, QC, H3A 0E9, Canada
| | - Caroline Bund
- MNMS Platform, University Hospitals of Strasbourg, Strasbourg 67098, France
- ICube, University of Strasbourg, CNRS UMR, 7357, Strasbourg 67000, France
- Department of Nuclear Medicine and Molecular Imaging, ICANS, Strasbourg 67000, France
| | - Julien Todeschi
- Department of Neurosurgery, University Hospitals of Strasbourg, Strasbourg, 67091, France
| | - Izzie Jacques Namer
- MNMS Platform, University Hospitals of Strasbourg, Strasbourg 67098, France
- ICube, University of Strasbourg, CNRS UMR, 7357, Strasbourg 67000, France
- Department of Nuclear Medicine and Molecular Imaging, ICANS, Strasbourg 67000, France
| | - A Ercument Cicek
- Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| |
Collapse
|
6
|
Liu C, Wan AH, Liang H, Sun L, Li J, Yang R, Li Q, Wu R, Hu K, Yang Y, Cai S, Wan G, He W. Biological informed graph neural network for tumor mutation burden prediction and immunotherapy-related pathway analysis in gastric cancer. Comput Struct Biotechnol J 2023; 21:4540-4551. [PMID: 37810279 PMCID: PMC10550590 DOI: 10.1016/j.csbj.2023.09.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023] Open
Abstract
Tumor mutation burden (TMB) has emerged as an essential biomarker for assessing the efficacy of cancer immunotherapy. However, due to the inherent complexity of tumors, TMB is not always correlated with the responsiveness of immune checkpoint inhibitors (ICIs). Thus, refining the interpretation and contextualization of TMB is a requisite for enhancing clinical outcomes. In this study, we conducted a comprehensive investigation of the relationship between TMB and multi-omics data across 33 human cancer types. Our analysis revealed distinct biological changes associated with varying TMB statuses in STAD, COAD, and UCEC. While multi-omics data offer an opportunity to dissect the intricacies of tumors, extracting meaningful biological insights from such massive information remains a formidable challenge. To address this, we developed and implemented the PGLCN, a biologically informed graph neural network based on pathway interaction information. This model facilitates the stratification of patients into subgroups with distinct TMB statuses and enables the evaluation of driver biological processes through enhanced interpretability. By integrating multi-omics data for TMB prediction, our PGLCN model outperformed previous traditional machine learning methodologies, demonstrating superior TMB status prediction accuracy (STAD AUC: 0.976 ± 0.007; COAD AUC: 0.994 ± 0.007; UCEC AUC: 0.947 ± 0.023) and enhanced interpretability (BA-House: 1.0; BA-Community: 0.999; BA-Grid: 0.994; Tree-Cycles: 0.917; Tree-Grids: 0.867). Furthermore, the biological interpretability inherent to PGLCN identified the Toll-like receptor family and DNA repair pathways as potential combined biomarkers in conjunction with TMB status in gastric cancer. This finding suggests a potential synergistic targeting strategy with immunotherapy for gastric cancer, thus advancing the field of precision oncology.
Collapse
Affiliation(s)
- Chuwei Liu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
| | - Arabella H. Wan
- Department of Pathology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
| | - Heng Liang
- National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Lei Sun
- National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Jiarui Li
- National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Ranran Yang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Qinghai Li
- Department of Gastrointestinal Surgery, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
| | - Ruibo Wu
- National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Kunhua Hu
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Shirong Cai
- Department of Gastrointestinal Surgery, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
| | - Guohui Wan
- National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Weiling He
- Department of Gastrointestinal Surgery, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, China
- Department of Gastrointestinal Surgery, Xiang’an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian 361000, China
| |
Collapse
|
7
|
Zhu J, Oh JH, Simhal AK, Elkin R, Norton L, Deasy JO, Tannenbaum A. Geometric graph neural networks on multi-omics data to predict cancer survival outcomes. Comput Biol Med 2023; 163:107117. [PMID: 37329617 PMCID: PMC10638676 DOI: 10.1016/j.compbiomed.2023.107117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/25/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
The advance of sequencing technologies has enabled a thorough molecular characterization of the genome in human cancers. To improve patient prognosis predictions and subsequent treatment strategies, it is imperative to develop advanced computational methods to analyze large-scale, high-dimensional genomic data. However, traditional machine learning methods face a challenge in handling the high-dimensional, low-sample size problem that is shown in most genomic data sets. To address this, our group has developed geometric network analysis techniques on multi-omics data in connection with prior biological knowledge derived from protein-protein interactions (PPIs) or pathways. Geometric features obtained from the genomic network, such as Ollivier-Ricci curvature and the invariant measure of the associated Markov chain, have been shown to be predictive of survival outcomes in various cancers. In this study, we propose a novel supervised deep learning method called geometric graph neural network (GGNN) that incorporates such geometric features into deep learning for enhanced predictive power and interpretability. More specifically, we utilize a state-of-the-art graph neural network with sparse connections between the hidden layers based on known biology of the PPI network and pathway information. Geometric features along with multi-omics data are then incorporated into the corresponding layers. The proposed approach utilizes a local-global principle in such a manner that highly predictive features are selected at the front layers and fed directly to the last layer for multivariable Cox proportional-hazards regression modeling. The method was applied to multi-omics data from the CoMMpass study of multiple myeloma and ten major cancers in The Cancer Genome Atlas (TCGA). In most experiments, our method showed superior predictive performance compared to other alternative methods.
Collapse
Affiliation(s)
- Jiening Zhu
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA.
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Anish K Simhal
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Rena Elkin
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Larry Norton
- Department of Medicine, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Allen Tannenbaum
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA; Department of Computer Science, Stony Brook University, NY, USA.
| |
Collapse
|
8
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland.
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP, UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| |
Collapse
|
9
|
Huang J, Zhao C, Zhang X, Zhao Q, Zhang Y, Chen L, Dai G. Hepatitis B virus pathogenesis relevant immunosignals uncovering amino acids utilization related risk factors guide artificial intelligence-based precision medicine. Front Pharmacol 2022; 13:1079566. [PMID: 36569318 PMCID: PMC9780394 DOI: 10.3389/fphar.2022.1079566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022] Open
Abstract
Background: Although immune microenvironment-related chemokines, extracellular matrix (ECM), and intrahepatic immune cells are reported to be highly involved in hepatitis B virus (HBV)-related diseases, their roles in diagnosis, prognosis, and drug sensitivity evaluation remain unclear. Here, we aimed to study their clinical use to provide a basis for precision medicine in hepatocellular carcinoma (HCC) via the amalgamation of artificial intelligence. Methods: High-throughput liver transcriptomes from Gene Expression Omnibus (GEO), NODE (https://www.bio.sino.org/node), the Cancer Genome Atlas (TCGA), and our in-house hepatocellular carcinoma patients were collected in this study. Core immunosignals that participated in the entire diseases course of hepatitis B were explored using the "Gene set variation analysis" R package. Using ROC curve analysis, the impact of core immunosignals and amino acid utilization related gene on hepatocellular carcinoma patient's clinical outcome were calculated. The utility of core immunosignals as a classifier for hepatocellular carcinoma tumor tissue was evaluated using explainable machine-learning methods. A novel deep residual neural network model based on immunosignals was constructed for the long-term overall survival (LS) analysis. In vivo drug sensitivity was calculated by the "oncoPredict" R package. Results: We identified nine genes comprising chemokines and ECM related to hepatitis B virus-induced inflammation and fibrosis as CLST signals. Moreover, CLST was co-enriched with activated CD4+ T cells bearing harmful factors (aCD4) during all stages of hepatitis B virus pathogenesis, which was also verified by our hepatocellular carcinoma data. Unexpectedly, we found that hepatitis B virus-hepatocellular carcinoma patients in the CLSThighaCD4high subgroup had the shortest overall survival (OS) and were characterized by a risk gene signature associated with amino acids utilization. Importantly, characteristic genes specific to CLST/aCD4 showed promising clinical relevance in identifying patients with early-stage hepatocellular carcinoma via explainable machine learning. In addition, the 5-year long-term overall survival of hepatocellular carcinoma patients can be effectively classified by CLST/aCD4 based GeneSet-ResNet model. Subgroups defined by CLST and aCD4 were significantly involved in the sensitivity of hepatitis B virus-hepatocellular carcinoma patients to chemotherapy treatments. Conclusion: CLST and aCD4 are hepatitis B virus pathogenesis-relevant immunosignals that are highly involved in hepatitis B virus-induced inflammation, fibrosis, and hepatocellular carcinoma. Gene set variation analysis derived immunogenomic signatures enabled efficient diagnostic and prognostic model construction. The clinical application of CLST and aCD4 as indicators would be beneficial for the precision management of hepatocellular carcinoma.
Collapse
Affiliation(s)
- Jun Huang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China,*Correspondence: Jun Huang, ; Liping Chen, ; Guifu Dai,
| | - Chunbei Zhao
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Xinhe Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Qiaohui Zhao
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Yanting Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Liping Chen
- Key Laboratory of Gastroenterology and Hepatology, State Key Laboratory for Oncogenes and Related Genes, Department of Gastroenterology and Hepatology, Ministry of Health, Shanghai Institute of Digestive Disease, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China,Shanghai Public Health Clinical Center, Fudan University, Shanghai, China,*Correspondence: Jun Huang, ; Liping Chen, ; Guifu Dai,
| | - Guifu Dai
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China,*Correspondence: Jun Huang, ; Liping Chen, ; Guifu Dai,
| |
Collapse
|
10
|
Alzoubi I, Bao G, Zheng Y, Wang X, Graeber MB. Artificial intelligence techniques for neuropathological diagnostics and research. Neuropathology 2022. [PMID: 36443935 DOI: 10.1111/neup.12880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 12/03/2022]
Abstract
Artificial intelligence (AI) research began in theoretical neurophysiology, and the resulting classical paper on the McCulloch-Pitts mathematical neuron was written in a psychiatry department almost 80 years ago. However, the application of AI in digital neuropathology is still in its infancy. Rapid progress is now being made, which prompted this article. Human brain diseases represent distinct system states that fall outside the normal spectrum. Many differ not only in functional but also in structural terms, and the morphology of abnormal nervous tissue forms the traditional basis of neuropathological disease classifications. However, only a few countries have the medical specialty of neuropathology, and, given the sheer number of newly developed histological tools that can be applied to the study of brain diseases, a tremendous shortage of qualified hands and eyes at the microscope is obvious. Similarly, in neuroanatomy, human observers no longer have the capacity to process the vast amounts of connectomics data. Therefore, it is reasonable to assume that advances in AI technology and, especially, whole-slide image (WSI) analysis will greatly aid neuropathological practice. In this paper, we discuss machine learning (ML) techniques that are important for understanding WSI analysis, such as traditional ML and deep learning, introduce a recently developed neuropathological AI termed PathoFusion, and present thoughts on some of the challenges that must be overcome before the full potential of AI in digital neuropathology can be realized.
Collapse
Affiliation(s)
- Islam Alzoubi
- School of Computer Science The University of Sydney Sydney New South Wales Australia
| | - Guoqing Bao
- School of Computer Science The University of Sydney Sydney New South Wales Australia
| | - Yuqi Zheng
- Ken Parker Brain Tumour Research Laboratories Brain and Mind Centre, Faculty of Medicine and Health, University of Sydney Camperdown New South Wales Australia
| | - Xiuying Wang
- School of Computer Science The University of Sydney Sydney New South Wales Australia
| | - Manuel B. Graeber
- Ken Parker Brain Tumour Research Laboratories Brain and Mind Centre, Faculty of Medicine and Health, University of Sydney Camperdown New South Wales Australia
| |
Collapse
|
11
|
Sidak D, Schwarzerová J, Weckwerth W, Waldherr S. Interpretable machine learning methods for predictions in systems biology from omics data. Front Mol Biosci 2022; 9:926623. [PMID: 36387282 PMCID: PMC9650551 DOI: 10.3389/fmolb.2022.926623] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
Collapse
Affiliation(s)
- David Sidak
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
| | - Jana Schwarzerová
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic
| | - Wolfram Weckwerth
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Vienna Metabolomics Center (VIME), Faculty of Life Sciences, University of Vienna, Vienna, Austria
| | - Steffen Waldherr
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- *Correspondence: Steffen Waldherr,
| |
Collapse
|
12
|
Artificial Intelligence for Outcome Modeling in Radiotherapy. Semin Radiat Oncol 2022; 32:351-364. [DOI: 10.1016/j.semradonc.2022.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Liang B, Gong H, Lu L, Xu J. Risk stratification and pathway analysis based on graph neural network and interpretable algorithm. BMC Bioinformatics 2022; 23:394. [PMID: 36167504 PMCID: PMC9516820 DOI: 10.1186/s12859-022-04950-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 09/19/2022] [Indexed: 12/01/2022] Open
Abstract
Background Pathway-based analysis of transcriptomic data has shown greater stability and better performance than traditional gene-based analysis. Until now, some pathway-based deep learning models have been developed for bioinformatic analysis, but these models have not fully considered the topological features of pathways, which limits the performance of the final prediction result. Results To address this issue, we propose a novel model, called PathGNN, which constructs a Graph Neural Networks (GNNs) model that can capture topological features of pathways. As a case, PathGNN was applied to predict long-term survival of four types of cancer and achieved promising predictive performance when compared to other common methods. Furthermore, the adoption of an interpretation algorithm enabled the identification of plausible pathways associated with survival. Conclusion PathGNN demonstrates that GNN can be effectively applied to build a pathway-based model, resulting in promising predictive power. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04950-1.
Collapse
Affiliation(s)
- Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Lu Lu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Yunjing Road 701, Shanghai, China.
| |
Collapse
|
14
|
Qin X, Yin D, Dong X, Chen D, Zhang S. Survival prediction model for right-censored data based on improved composite quantile regression neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:7521-7542. [PMID: 35801434 DOI: 10.3934/mbe.2022354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of the field of survival analysis, statistical inference of right-censored data is of great importance for the study of medical diagnosis. In this study, a right-censored data survival prediction model based on an improved composite quantile regression neural network framework, called rcICQRNN, is proposed. It incorporates composite quantile regression with the loss function of a multi-hidden layer feedforward neural network, combined with an inverse probability weighting method for survival prediction. Meanwhile, the hyperparameters involved in the neural network are adjusted using the WOA algorithm, integer encoding and One-Hot encoding are implemented to encode the classification features, and the BWOA variable selection method for high-dimensional data is proposed. The rcICQRNN algorithm was tested on a simulated dataset and two real breast cancer datasets, and the performance of the model was evaluated by three evaluation metrics. The results show that the rcICQRNN-5 model is more suitable for analyzing simulated datasets. The One-Hot encoding of the WOA-rcICQRNN-30 model is more applicable to the NKI70 data. The model results are optimal for k=15 after feature selection for the METABRIC dataset. Finally, we implemented the method for cross-dataset validation. On the whole, the Cindex results using One-Hot encoding data are more stable, making the proposed rcICQRNN prediction model flexible enough to assist in medical decision making. It has practical applications in areas such as biomedicine, insurance actuarial and financial economics.
Collapse
Affiliation(s)
- Xiwen Qin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongmei Yin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Xiaogang Dong
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongxue Chen
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Shuang Zhang
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| |
Collapse
|
15
|
Adnan N, Zand M, Huang THM, Ruan J. Construction and Evaluation of Robust Interpretation Models for Breast Cancer Metastasis Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1344-1353. [PMID: 34662279 PMCID: PMC9254332 DOI: 10.1109/tcbb.2021.3120673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Interpretability of machine learning (ML) models represents the extent to which a model's decision-making process can be understood by model developers and/or end users. Transcriptomics-based cancer prognosis models, for example, while achieving good accuracy, are usually hard to interpret, due to the high-dimensional feature space and the complexity of models. As interpretability is critical for the transparency and fairness of ML models, several algorithms have been proposed to improve the interpretability of arbitrary classifiers. However, evaluation of these algorithms often requires substantial domain knowledge. Here, we propose a breast cancer metastasis prediction model using a very small number of biologically interpretable features, and a simple yet novel model interpretation approach that can provide personalized interpretations. In addition, we contributed, to the best of our knowledge, the first method to quantitatively compare different interpretation algorithms. Experimental results show that our model not only achieved competitive prediction accuracy, but also higher inter-classifier interpretation consistency than state-of-the-art interpretation methods. Importantly, our interpretation results can improve the generalizability of the prediction models. Overall, this work provides several novel ideas to construct and evaluate interpretable ML models that can be valuable to both the cancer machine learning community and related application domains.
Collapse
|
16
|
Wang S, Zhang H, Liu Z, Liu Y. A Novel Deep Learning Method to Predict Lung Cancer Long-Term Survival With Biological Knowledge Incorporated Gene Expression Images and Clinical Data. Front Genet 2022; 13:800853. [PMID: 35368657 PMCID: PMC8964372 DOI: 10.3389/fgene.2022.800853] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 02/01/2022] [Indexed: 01/22/2023] Open
Abstract
Lung cancer is the leading cause of the cancer deaths. Therefore, predicting the survival status of lung cancer patients is of great value. However, the existing methods mainly depend on statistical machine learning (ML) algorithms. Moreover, they are not appropriate for high-dimensionality genomics data, and deep learning (DL), with strong high-dimensional data learning capability, can be used to predict lung cancer survival using genomics data. The Cancer Genome Atlas (TCGA) is a great database that contains many kinds of genomics data for 33 cancer types. With this enormous amount of data, researchers can analyze key factors related to cancer therapy. This paper proposes a novel method to predict lung cancer long-term survival using gene expression data from TCGA. Firstly, we select the most relevant genes to the target problem by the supervised feature selection method called mutual information selector. Secondly, we propose a method to convert gene expression data into two kinds of images with KEGG BRITE and KEGG Pathway data incorporated, so that we could make good use of the convolutional neural network (CNN) model to learn high-level features. Afterwards, we design a CNN-based DL model and added two kinds of clinical data to improve the performance, so that we finally got a multimodal DL model. The generalized experiments results indicated that our method performed much better than the ML models and unimodal DL models. Furthermore, we conduct survival analysis and observe that our model could better divide the samples into high-risk and low-risk groups.
Collapse
Affiliation(s)
- Shuo Wang
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Zhen Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.,Graduate School of Engineering, Nagasaki Institute of Applied Science, Nagasaki, Japan
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
17
|
Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression. NUCLEAR ENGINEERING AND TECHNOLOGY 2022. [DOI: 10.1016/j.net.2021.10.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|