1
|
Zhang ZH, Du Y, Wei S, Pei W. Multilayered insights: a machine learning approach for personalized prognostic assessment in hepatocellular carcinoma. Front Oncol 2024; 13:1327147. [PMID: 38486931 PMCID: PMC10937467 DOI: 10.3389/fonc.2023.1327147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/08/2023] [Indexed: 03/17/2024] Open
Abstract
Background Hepatocellular carcinoma (HCC) is a complex malignancy, and precise prognosis assessment is vital for personalized treatment decisions. Objective This study aimed to develop a multi-level prognostic risk model for HCC, offering individualized prognosis assessment and treatment guidance. Methods By utilizing data from The Cancer Genome Atlas (TCGA) and the Surveillance, Epidemiology, and End Results (SEER) database, we performed differential gene expression analysis to identify genes associated with survival in HCC patients. The HCC Differential Gene Prognostic Model (HCC-DGPM) was developed through multivariate Cox regression. Clinical indicators were incorporated into the HCC-DGPM using Cox regression, leading to the creation of the HCC Multilevel Prognostic Model (HCC-MLPM). Immune function was evaluated using single-sample Gene Set Enrichment Analysis (ssGSEA), and immune cell infiltration was assessed. Patient responsiveness to immunotherapy was evaluated using the Immunophenoscore (IPS). Clinical drug responsiveness was investigated using drug-related information from the TCGA database. Cox regression, Kaplan-Meier analysis, and trend association tests were conducted. Results Seven differentially expressed genes from the TCGA database were used to construct the HCC-DGPM. Additionally, four clinical indicators associated with survival were identified from the SEER database for model adjustment. The adjusted HCC-MLPM showed significantly improved discriminative capacity (AUC=0.819 vs. 0.724). External validation involving 153 HCC patients from the International Cancer Genome Consortium (ICGC) database verified the performance of the HCC-MLPM (AUC=0.776). Significantly, the HCC-MLPM exhibited predictive capacity for patient response to immunotherapy and clinical drug efficacy (P < 0.05). Conclusion This study offers comprehensive insights into HCC prognosis and develops predictive models to enhance patient outcomes. The evaluation of immune function, immune cell infiltration, and clinical drug responsiveness enhances our comprehension and management of HCC.
Collapse
Affiliation(s)
| | - Yunxiang Du
- Department of Oncology, Huai’an 82 Hospital, China RongTong Medical Healthcare Group Co., Ltd., Chengdu, China
| | - Shuzhen Wei
- Department of Oncology, Huai’an 82 Hospital, China RongTong Medical Healthcare Group Co., Ltd., Chengdu, China
| | - Weidong Pei
- Department of Discipline Development, China RongTong Medical Healthcare Group Co., Ltd., Chengdu, China
| |
Collapse
|
2
|
Li B, Nabavi S. A multimodal graph neural network framework for cancer molecular subtype classification. BMC Bioinformatics 2024; 25:27. [PMID: 38225583 PMCID: PMC10789042 DOI: 10.1186/s12859-023-05622-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 12/15/2023] [Indexed: 01/17/2024] Open
Abstract
BACKGROUND The recent development of high-throughput sequencing has created a large collection of multi-omics data, which enables researchers to better investigate cancer molecular profiles and cancer taxonomy based on molecular subtypes. Integrating multi-omics data has been proven to be effective for building more precise classification models. Most current multi-omics integrative models use either an early fusion in the form of concatenation or late fusion with a separate feature extractor for each omic, which are mainly based on deep neural networks. Due to the nature of biological systems, graphs are a better structural representation of bio-medical data. Although few graph neural network (GNN) based multi-omics integrative methods have been proposed, they suffer from three common disadvantages. One is most of them use only one type of connection, either inter-omics or intra-omic connection; second, they only consider one kind of GNN layer, either graph convolution network (GCN) or graph attention network (GAT); and third, most of these methods have not been tested on a more complex classification task, such as cancer molecular subtypes. RESULTS In this study, we propose a novel end-to-end multi-omics GNN framework for accurate and robust cancer subtype classification. The proposed model utilizes multi-omics data in the form of heterogeneous multi-layer graphs, which combine both inter-omics and intra-omic connections from established biological knowledge. The proposed model incorporates learned graph features and global genome features for accurate classification. We tested the proposed model on the Cancer Genome Atlas (TCGA) Pan-cancer dataset and TCGA breast invasive carcinoma (BRCA) dataset for molecular subtype and cancer subtype classification, respectively. The proposed model shows superior performance compared to four current state-of-the-art baseline models in terms of accuracy, F1 score, precision, and recall. The comparative analysis of GAT-based models and GCN-based models reveals that GAT-based models are preferred for smaller graphs with less information and GCN-based models are preferred for larger graphs with extra information.
Collapse
Affiliation(s)
- Bingjun Li
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA
| | - Sheida Nabavi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.
| |
Collapse
|
3
|
Pouryahya M, Oh JH, Javanmard P, Mathews JC, Belkhatir Z, Deasy JO, Tannenbaum AR. aWCluster: A Novel Integrative Network-Based Clustering of Multiomics for Subtype Analysis of Cancer Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1472-1483. [PMID: 33226952 PMCID: PMC9518829 DOI: 10.1109/tcbb.2020.3039511] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The remarkable growth of multi-platform genomic profiles has led to the challenge of multiomics data integration. In this study, we present a novel network-based multiomics clustering founded on the Wasserstein distance from optimal mass transport. This distance has many important geometric properties making it a suitable choice for application in machine learning and clustering. Our proposed method of aggregating multiomics and Wasserstein distance clustering (aWCluster) is applied to breast carcinoma as well as bladder carcinoma, colorectal adenocarcinoma, renal carcinoma, lung non-small cell adenocarcinoma, and endometrial carcinoma from The Cancer Genome Atlas project. Subtypes were characterized by the concordant effect of mRNA expression, DNA copy number alteration, and DNA methylation of genes and their neighbors in the interaction network. aWCluster successfully clusters all cancer types into classes with significantly different survival rates. Also, a gene ontology enrichment analysis of significant genes in the low survival subgroup of breast cancer leads to the well-known phenomenon of tumor hypoxia and the transcription factor ETS1 whose expression is induced by hypoxia. We believe aWCluster has the potential to discover novel subtypes and biomarkers by accentuating the genes that have concordant multiomics measurements in their interaction network, which are challenging to find without the network inference or with single omics analysis.
Collapse
|
4
|
Zhang G, Peng Z, Yan C, Wang J, Luo J, Luo H. MultiGATAE: A Novel Cancer Subtype Identification Method Based on Multi-Omics and Attention Mechanism. Front Genet 2022; 13:855629. [PMID: 35391797 PMCID: PMC8979770 DOI: 10.3389/fgene.2022.855629] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
Cancer is one of the leading causes of death worldwide, which brings an urgent need for its effective treatment. However, cancer is highly heterogeneous, meaning that one cancer can be divided into several subtypes with distinct pathogenesis and outcomes. This is considered as the main problem which limits the precision treatment of cancer. Thus, cancer subtypes identification is of great importance for cancer diagnosis and treatment. In this work, we propose a deep learning method which is based on multi-omics and attention mechanism to effectively identify cancer subtypes. We first used similarity network fusion to integrate multi-omics data to construct a similarity graph. Then, the similarity graph and the feature matrix of the patient are input into a graph autoencoder composed of a graph attention network and omics-level attention mechanism to learn embedding representation. The K-means clustering method is applied to the embedding representation to identify cancer subtypes. The experiment on eight TCGA datasets confirmed that our proposed method performs better for cancer subtypes identification when compared with the other state-of-the-art methods. The source codes of our method are available at https://github.com/kataomoi7/multiGATAE.
Collapse
Affiliation(s)
- Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Zhen Peng
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
5
|
Hao W, Pang S, Chen Z. Multi-view spectral clustering via common structure maximization of local and global representations. Neural Netw 2021; 143:595-606. [PMID: 34343774 DOI: 10.1016/j.neunet.2021.07.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 06/03/2021] [Accepted: 07/16/2021] [Indexed: 11/16/2022]
Abstract
The essential problem of multi-view spectral clustering is to learn a good common representation by effectively utilizing multi-view information. A popular strategy for improving the quality of the common representation is utilizing global and local information jointly. Most existing methods capture local manifold information by graph regularization. However, once local graphs are constructed, they do not change during the whole optimization process. This may lead to a degenerated common representation in the case of existing unreliable graphs. To address this problem, rather than directly using fixed local representations, we propose a dynamic strategy to construct a common local representation. Then, we impose a fusion term to maximize the common structure of the local and global representations so that they can boost each other in a mutually reinforcing manner. With this fusion term, we integrate local and global representation learning in a unified framework and design an alternative iteration based optimization procedure to solve it. Extensive experiments conducted on a number of benchmark datasets support the superiority of our algorithm over several state-of-the-art methods.
Collapse
Affiliation(s)
- Wenyu Hao
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Shanmin Pang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Zhikai Chen
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| |
Collapse
|
6
|
Chierici M, Bussola N, Marcolini A, Francescatto M, Zandonà A, Trastulla L, Agostinelli C, Jurman G, Furlanello C. Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling. Front Oncol 2020; 10:1065. [PMID: 32714870 PMCID: PMC7340129 DOI: 10.3389/fonc.2020.01065] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 05/28/2020] [Indexed: 12/20/2022] Open
Abstract
Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.
Collapse
Affiliation(s)
| | - Nicole Bussola
- Fondazione Bruno Kessler, Trento, Italy
- University of Trento, Trento, Italy
| | | | - Margherita Francescatto
- Fondazione Bruno Kessler, Trento, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | | | | | | | | | | |
Collapse
|