1
|
Luo X, Zhang X, Su D, Li H, Zou M, Xiong Y, Yang L. Deep Clustering-Based Metabolic Stratification of Non-Small Cell Lung Cancer Patients Through Integration of Somatic Mutation Profile and Network Propagation Algorithm. Interdiscip Sci 2025:10.1007/s12539-025-00699-2. [PMID: 40100545 DOI: 10.1007/s12539-025-00699-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 02/21/2025] [Accepted: 02/22/2025] [Indexed: 03/20/2025]
Abstract
As a common malignancy of the lower respiratory tract, non-small cell lung cancer (NSCLC) represents a major oncological challenge globally, characterized by high incidence and mortality rates. Recent research highlights the critical involvement of somatic mutations in the onset and development of NSCLC. Stratification of NSCLC patients based on somatic mutation data could facilitate the identification of patients likely to respond to personalized therapeutic strategies. However, stratification of NSCLC patients using somatic mutation data is challenging due to the sparseness of this data. In this study, based on sparse somatic mutation data from 4581 NSCLC patients from the Memorial Sloan Kettering Cancer Center (MSKCC) database, we systematically evaluate the metabolic pathway activity in NSCLC patients through the application of network propagation algorithm and computational biology algorithms. Based on these metabolic pathways associated with prognosis, as recognized through univariate Cox regression analysis, NSCLC patients are stratified using the deep clustering algorithm to explore the optimal classification strategy, thereby establishing biologically meaningful metabolic subtypes of NSCLC patients. The precise NSCLC metabolic subtypes obtained from the network propagation algorithm and deep clustering algorithm are systematically evaluated and validated for survival benefits of immunotherapy. Our research marks progress towards developing a universal approach for classifying NSCLC patients based solely on somatic mutation profiles, employing deep clustering algorithm. The implementation of our research will help to deepen the analysis of NSCLC patients' metabolic subtypes from the perspective of tumor microenvironment, providing a strong basis for the formulation of more precise personalized treatment plans.
Collapse
Affiliation(s)
- Xu Luo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xinpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Min Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
2
|
Zhou L, Li J, Tan W. M-NET: Transforming Single Nucleotide Variations Into Patient Feature Images for the Prediction of Prostate Cancer Metastasis and Identification of Significant Pathways. IEEE J Biomed Health Inform 2025; 29:1199-1208. [PMID: 39509309 DOI: 10.1109/jbhi.2024.3493618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
High-performance prediction of prostate cancer metastasis based on single nucleotide variations remains a challenge. Therefore, we developed a novel biologically informed deep learning framework, named M-NET, for the prediction of prostate cancer metastasis. Within the framework, we transformed single nucleotide variations into patient feature images that are optimal for fitting convolutional neural networks. Moreover, we identified significant pathways associated with the metastatic status. The experimental results showed that M-NET significantly outperformed other comparison methods based on single nucleotide variations, achieving improvements in accuracy, precision, recall, F1-score, area under the receiver operating characteristics curve, and area under the precision-recall curve by 6.3%, 8.4%, 5.1%, 0.070, 0.041, and 0.026, respectively. Furthermore, M-NET identified some important pathways associated with the metastatic status, such as signaling by the hedgehog pathway. In summary, compared with other comparative methods, M-NET exhibited a better performance in the prediction of prostate cancer metastasis.
Collapse
|
3
|
Zhang C, Li W, Deng M, Jiang Y, Cui X, Chen P. SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1752-1764. [PMID: 38875076 DOI: 10.1109/tcbb.2024.3414498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets.
Collapse
|
4
|
Su D, Xiong Y, Wang S, Wei H, Ke J, Li H, Wang T, Zuo Y, Yang L. Structural deep clustering network for stratification of breast cancer patients through integration of somatic mutation profiles. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 242:107808. [PMID: 37716222 DOI: 10.1016/j.cmpb.2023.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/15/2023] [Accepted: 09/10/2023] [Indexed: 09/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is among of the most malignant tumor that occurs in women and is one of the leading causes of death from gynecologic malignancy worldwide. The high degree of heterogeneity that characterizes breast cancer makes it challenging to devise effective therapeutic strategies. Accumulating evidence highlights the crucial role of stratifying breast cancer patients into clinically significant subtypes to achieve better prognoses and treatments. The structural deep clustering network is a graph convolutional network-based clustering algorithm that integrates structural information and has achieved state-of-the-art performance in various applications. METHODS In this study, we employed structural deep clustering network to integrate somatic mutation profiles for stratifying 2526 breast cancer patients from the Memorial Sloan Kettering Cancer Center into two clinically differentiable subtypes. RESULTS Breast cancer patients in cluster 1 exhibited better prognosis than breast cancer patients in cluster 2, and the difference between them was statistically significant. The immunogenomic landscape further demonstrated that cluster 1 was associated with remarkable infiltration of the tumor infiltrating lymphocytes. The clustering subtype could be used to evaluate the therapeutic benefit of immunotherapy and chemotherapy in breast cancer patients. Furthermore, our approach effectively classified patients from eight different cancer types, demonstrating its generalizability. CONCLUSIONS Our study represents a step towards a generic methodology for classifying cancer patients using only somatic mutation data and structural deep clustering network approaches. Employing structural deep clustering network to identify breast cancer subtypes is promising and can inform the development of more accurate and personalized therapies.
Collapse
Affiliation(s)
- Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiawei Ke
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot, 010010, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
5
|
Zou M, Li H, Su D, Xiong Y, Wei H, Wang S, Sun H, Wang T, Xi Q, Zuo Y, Yang L. Integrating somatic mutation profiles with structural deep clustering network for metabolic stratification in pancreatic cancer: a comprehensive analysis of prognostic and genomic landscapes. Brief Bioinform 2023; 25:bbad430. [PMID: 38040491 PMCID: PMC10783866 DOI: 10.1093/bib/bbad430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/29/2023] [Accepted: 11/05/2023] [Indexed: 12/03/2023] Open
Abstract
Pancreatic cancer is a globally recognized highly aggressive malignancy, posing a significant threat to human health and characterized by pronounced heterogeneity. In recent years, researchers have uncovered that the development and progression of cancer are often attributed to the accumulation of somatic mutations within cells. However, cancer somatic mutation data exhibit characteristics such as high dimensionality and sparsity, which pose new challenges in utilizing these data effectively. In this study, we propagated the discrete somatic mutation data of pancreatic cancer through a network propagation model based on protein-protein interaction networks. This resulted in smoothed somatic mutation profile data that incorporate protein network information. Based on this smoothed mutation profile data, we obtained the activity levels of different metabolic pathways in pancreatic cancer patients. Subsequently, using the activity levels of various metabolic pathways in cancer patients, we employed a deep clustering algorithm to establish biologically and clinically relevant metabolic subtypes of pancreatic cancer. Our study holds scientific significance in classifying pancreatic cancer based on somatic mutation data and may provide a crucial theoretical basis for the diagnosis and immunotherapy of pancreatic cancer patients.
Collapse
Affiliation(s)
- Min Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hongmei Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qilemuge Xi
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot 010010, China
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
6
|
Lim S, Kim Y, Gu J, Lee S, Shin W, Kim S. Supervised chemical graph mining improves drug-induced liver injury prediction. iScience 2022; 26:105677. [PMID: 36654861 PMCID: PMC9840932 DOI: 10.1016/j.isci.2022.105677] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/11/2022] [Accepted: 11/23/2022] [Indexed: 12/27/2022] Open
Abstract
Drug-induced liver injury (DILI) is the main cause of drug failure in clinical trials. The characterization of toxic compounds in terms of chemical structure is important because compounds can be metabolized to toxic substances in the liver. Traditional machine learning approaches have had limited success in predicting DILI, and emerging deep graph neural network (GNN) models are yet powerful enough to predict DILI. In this study, we developed a completely different approach, supervised subgraph mining (SSM), a strategy to mine explicit subgraph features by iteratively updating individual graph transitions to maximize DILI fidelity. Our method outperformed previous methods including state-of-the-art GNN tools in classifying DILI on two different datasets: DILIst and TDC-benchmark. We also combined the subgraph features by using SMARTS-based frequent structural pattern matching and associated them with drugs' ATC code.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Youngkuk Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Sunho Lee
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea
| | - Wonseok Shin
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea
- Corresponding author
| |
Collapse
|
7
|
Abstract
Thousands of genes are perturbed by cancer, and these disturbances can be seen in transcriptome, methylation, somatic mutation, and copy number variation omics studies. Understanding their connectivity patterns as an omnigenic neighbourhood in a molecular interaction network (interactome) is a key step towards advancing knowledge of the molecular mechanisms underlying cancers. Here, we introduce a unified connectivity line (CLine) to pinpoint omics-specific omnigenic patterns across 15 curated cancers. Taking advantage of the universality of CLine, we distinguish the peripheral and core genes for each omics aspect. We propose a network-based framework, multi-omics periphery and core (MOPC), to combine peripheral and core genes from different omics into a button-like structure. On the basis of network proximity, we provide evidence that core genes tend to be specifically perturbed in one omics, but the peripheral genes are diversely perturbed in multiple omics. And the core of one omics is regulated by multiple omics peripheries. Finally, we take the MOPC as an omnigenic neighbourhood, describe its characteristics, and explore its relative contribution to network-based mechanisms of cancer. We were able to present how multi-omics perturbations percolate through the human interactome and contribute to an integrated periphery and core.
Collapse
|
8
|
Zhang L, Fan S, Vera J, Lai X. A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer. Comput Struct Biotechnol J 2022; 21:34-45. [PMID: 36514340 PMCID: PMC9732137 DOI: 10.1016/j.csbj.2022.11.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer is a heterogeneous disease mainly driven by abnormal gene perturbations in regulatory networks. Therefore, it is appealing to identify the common and specific perturbed genes from multiple cancer networks. We developed an integrative network medicine approach to identify novel biomarkers and investigate drug repurposing across cancer types. We used a network-based method to prioritize genes in cancer-specific networks reconstructed using human transcriptome and interactome data. The prioritized genes show extensive perturbation and strong regulatory interaction with other highly perturbed genes, suggesting their vital contribution to tumorigenesis and tumor progression, and are therefore regarded as cancer genes. The cancer genes detected show remarkable performances in discriminating tumors from normal tissues and predicting survival times of cancer patients. Finally, we developed a network proximity approach to systematically screen drugs and identified dozens of candidates with repurposable potential in several cancer types. Taken together, we demonstrated the power of the network medicine approach to identify novel biomarkers and repurposable drugs in multiple cancer types. We have also made the data and code freely accessible to ensure reproducibility and reusability of the developed computational workflow.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Shiwei Fan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany,BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland,Corresponding author at: Universitätsklinikum Erlangen, Erlangen, Germany; Tampere University, Tampere, Finland.
| |
Collapse
|
9
|
Hu S, Zhang Z, Xiong H, Jiang M, Luo Y, Yan W, Zhao B. A tensor-based bi-random walks model for protein function prediction. BMC Bioinformatics 2022; 23:199. [PMID: 35637427 PMCID: PMC9150346 DOI: 10.1186/s12859-022-04747-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/24/2022] [Indexed: 11/26/2022] Open
Abstract
Background The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein–protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction. Results In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%. Conclusions The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04747-2.
Collapse
Affiliation(s)
- Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China
| | - Huijun Xiong
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Meiping Jiang
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Yingchun Luo
- Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410008, Hunan, China.,NHC Key Laboratory of Birth Defect for Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital), Changsha, 410100, Hunan, China
| | - Wei Yan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China. .,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China.
| |
Collapse
|
10
|
Shen JP. Artificial intelligence, molecular subtyping, biomarkers, and precision oncology. Emerg Top Life Sci 2021; 5:747-756. [PMID: 34881776 PMCID: PMC8786277 DOI: 10.1042/etls20210212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022]
Abstract
A targeted cancer therapy is only useful if there is a way to accurately identify the tumors that are susceptible to that therapy. Thus rapid expansion in the number of available targeted cancer treatments has been accompanied by a robust effort to subdivide the traditional histological and anatomical tumor classifications into molecularly defined subtypes. This review highlights the history of the paired evolution of targeted therapies and biomarkers, reviews currently used methods for subtype identification, and discusses challenges to the implementation of precision oncology as well as possible solutions.
Collapse
Affiliation(s)
- John Paul Shen
- Department of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, U.S.A
| |
Collapse
|
11
|
Jung I, Kim M, Rhee S, Lim S, Kim S. MONTI: A Multi-Omics Non-negative Tensor Decomposition Framework for Gene-Level Integrative Analysis. Front Genet 2021; 12:682841. [PMID: 34567063 PMCID: PMC8461247 DOI: 10.3389/fgene.2021.682841] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 08/12/2021] [Indexed: 11/13/2022] Open
Abstract
Multi-omics data is frequently measured to enrich the comprehension of biological mechanisms underlying certain phenotypes. However, due to the complex relations and high dimension of multi-omics data, it is difficult to associate omics features to certain biological traits of interest. For example, the clinically valuable breast cancer subtypes are well-defined at the molecular level, but are poorly classified using gene expression data. Here, we propose a multi-omics analysis method called MONTI (Multi-Omics Non-negative Tensor decomposition for Integrative analysis), which goal is to select multi-omics features that are able to represent trait specific characteristics. Here, we demonstrate the strength of multi-omics integrated analysis in terms of cancer subtyping. The multi-omics data are first integrated in a biologically meaningful manner to form a three dimensional tensor, which is then decomposed using a non-negative tensor decomposition method. From the result, MONTI selects highly informative subtype specific multi-omics features. MONTI was applied to three case studies of 597 breast cancer, 314 colon cancer, and 305 stomach cancer cohorts. For all the case studies, we found that the subtype classification accuracy significantly improved when utilizing all available multi-omics data. MONTI was able to detect subtype specific gene sets that showed to be strongly regulated by certain omics, from which correlation between omics types could be inferred. Furthermore, various clinical attributes of nine cancer types were analyzed using MONTI, which showed that some clinical attributes could be well explained using multi-omics data. We demonstrated that integrating multi-omics data in a gene centric manner improves detecting cancer subtype specific features and other clinical features, which may be used to further understand the molecular characteristics of interest. The software and data used in this study are available at: https://github.com/inukj/MONTI.
Collapse
Affiliation(s)
- Inuk Jung
- Department of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea
| | - Minsu Kim
- Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Sungmin Rhee
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Sangsoo Lim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-Gu, Seoul, South Korea
| | - Sun Kim
- Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, United States.,Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-Gu, Seoul, South Korea
| |
Collapse
|
12
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
13
|
Gumpinger AC, Rieck B, Grimm DG, Borgwardt K. Network-guided search for genetic heterogeneity between gene pairs. Bioinformatics 2021; 37:57-65. [PMID: 32573681 PMCID: PMC8034561 DOI: 10.1093/bioinformatics/btaa581] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 05/19/2020] [Accepted: 06/15/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anja C Gumpinger
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Bastian Rieck
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing 94315, Germany.,Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing 94315, Germany
| | | | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
14
|
Liu C, Han Z, Zhang ZK, Nussinov R, Cheng F. A network-based deep learning methodology for stratification of tumor mutations. Bioinformatics 2021; 37:82-88. [PMID: 33416857 PMCID: PMC8034530 DOI: 10.1093/bioinformatics/btaa1099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/23/2020] [Accepted: 12/28/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Tumor stratification has a wide range of biomedical and clinical applications, including diagnosis, prognosis and personalized treatment. However, cancer is always driven by the combination of mutated genes, which are highly heterogeneous across patients. Accurately subdividing the tumors into subtypes is challenging. RESULTS We developed a network-embedding based stratification (NES) methodology to identify clinically relevant patient subtypes from large-scale patients' somatic mutation profiles. The central hypothesis of NES is that two tumors would be classified into the same subtypes if their somatic mutated genes located in the similar network regions of the human interactome. We encoded the genes on the human protein-protein interactome with a network embedding approach and constructed the patients' vectors by integrating the somatic mutation profiles of 7344 tumor exomes across 15 cancer types. We firstly adopted the lightGBM classification algorithm to train the patients' vectors. The AUC value is around 0.89 in the prediction of the patient's cancer type and around 0.78 in the prediction of the tumor stage within a specific cancer type. The high classification accuracy suggests that network embedding-based patients' features are reliable for dividing the patients. We conclude that we can cluster patients with a specific cancer type into several subtypes by using an unsupervised clustering algorithm to learn the patients' vectors. Among the 15 cancer types, the new patient clusters (subtypes) identified by the NES are significantly correlated with patient survival across 12 cancer types. In summary, this study offers a powerful network-based deep learning methodology for personalized cancer medicine. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/ChengF-Lab/NES. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chuang Liu
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Zhen Han
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Zi-Ke Zhang
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
- College of Media and International Culture, Zhejiang University, Hangzhou 310028, China
| | - Ruth Nussinov
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| |
Collapse
|
15
|
Klein MI, Cannataro VL, Townsend JP, Newman S, Stern DF, Zhao H. Identifying modules of cooperating cancer drivers. Mol Syst Biol 2021; 17:e9810. [PMID: 33769711 PMCID: PMC7995435 DOI: 10.15252/msb.20209810] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 01/20/2021] [Accepted: 01/26/2021] [Indexed: 12/22/2022] Open
Abstract
Identifying cooperating modules of driver alterations can provide insights into cancer etiology and advance the development of effective personalized treatments. We present Cancer Rule Set Optimization (CRSO) for inferring the combinations of alterations that cooperate to drive tumor formation in individual patients. Application to 19 TCGA cancer types revealed a mean of 11 core driver combinations per cancer, comprising 2-6 alterations per combination and accounting for a mean of 70% of samples per cancer type. CRSO is distinct from methods based on statistical co-occurrence, which we demonstrate is a suboptimal criterion for investigating driver cooperation. CRSO identified well-studied driver combinations that were not detected by other approaches and nominated novel combinations that correlate with clinical outcomes in multiple cancer types. Novel synergies were identified in NRAS-mutant melanomas that may be therapeutically relevant. Core driver combinations involving NFE2L2 mutations were identified in four cancer types, supporting the therapeutic potential of NRF2 pathway inhibition. CRSO is available at https://github.com/mikekleinsgit/CRSO/.
Collapse
Affiliation(s)
- Michael I Klein
- Program in Computational Biology and BioinformaticsYale UniversityNew HavenCTUSA
- Bioinformatics R&DSema4StamfordCTUSA
| | - Vincent L Cannataro
- Department of BiologyEmmanuel CollegeBostonMAUSA
- Department of BiostatisticsYale School of Public HealthNew HavenCTUSA
| | - Jeffrey P Townsend
- Program in Computational Biology and BioinformaticsYale UniversityNew HavenCTUSA
- Department of BiostatisticsYale School of Public HealthNew HavenCTUSA
- Yale Cancer CenterYale UniversityNew HavenCTUSA
| | | | - David F Stern
- Yale Cancer CenterYale UniversityNew HavenCTUSA
- Department of PathologyYale School of MedicineNew HavenCTUSA
| | - Hongyu Zhao
- Program in Computational Biology and BioinformaticsYale UniversityNew HavenCTUSA
- Department of BiostatisticsYale School of Public HealthNew HavenCTUSA
- Yale Cancer CenterYale UniversityNew HavenCTUSA
| |
Collapse
|
16
|
Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics 2020; 36:3818-3824. [PMID: 32207514 DOI: 10.1093/bioinformatics/btaa203] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 01/13/2020] [Accepted: 03/19/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Biological pathway is an important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only one-third of human genes in KEGG, and pathways are fragmented. For this reason, there are few computational methods to use pathways for cancer subtype classification. RESULTS We present an explainable deep-learning model with attention mechanism and network propagation for cancer subtype classification. Each pathway is modeled by a graph convolutional network. Then, a multi-attention-based ensemble model combines several hundreds of pathways in an explainable manner. Lastly, network propagation on pathway-gene network explains why gene expression profiles in subtypes are different. In experiments with five TCGA cancer datasets, our method achieved very good classification accuracies and, additionally, identified subtype-specific pathways and biological functions. AVAILABILITY AND IMPLEMENTATION The source code is available at http://biohealth.snu.ac.kr/software/GCN_MAE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sangseon Lee
- Department of Computer Science and Engineering, Institute of Engineering Research
| | | | - Taeheon Lee
- Department of Computer Science and Engineering, Institute of Engineering Research
| | - Inyoung Sung
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Institute of Engineering Research.,Bioinformatics Institute.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
17
|
Rohani N, Eslahchi C. Classifying Breast Cancer Molecular Subtypes by Using Deep Clustering Approach. Front Genet 2020; 11:553587. [PMID: 33324444 PMCID: PMC7723873 DOI: 10.3389/fgene.2020.553587] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 08/25/2020] [Indexed: 01/07/2023] Open
Abstract
Cancer is a complex disease with a high rate of mortality. The characteristics of tumor masses are very heterogeneous; thus, the appropriate classification of tumors is a critical point in the effective treatment. A high level of heterogeneity has also been observed in breast cancer. Therefore, detecting the molecular subtypes of this disease is an essential issue for medicine that could be facilitated using bioinformatics. This study aims to discover the molecular subtypes of breast cancer using somatic mutation profiles of tumors. Nonetheless, the somatic mutation profiles are very sparse. Therefore, a network propagation method is used in the gene interaction network to make the mutation profiles dense. Afterward, the deep embedded clustering (DEC) method is used to classify the breast tumors into four subtypes. In the next step, gene signature of each subtype is obtained using Fisher's exact test. Besides the enrichment of gene signatures in numerous biological databases, clinical and molecular analyses verify that the proposed method using mutation profiles can efficiently detect the molecular subtypes of breast cancer. Finally, a supervised classifier is trained based on the discovered subtypes to predict the molecular subtype of a new patient. The code and material of the method are available at: https://github.com/nrohani/MolecularSubtypes.
Collapse
Affiliation(s)
- Narjes Rohani
- Department of Computer and Data Sciences, Faculty of Mathematics, Shahid Beheshti University, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematics, Shahid Beheshti University, Tehran, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| |
Collapse
|
18
|
Kim YA, Sarto Basso R, Wojtowicz D, Liu AS, Hochbaum DS, Vandin F, Przytycka TM. Identifying Drug Sensitivity Subnetworks with NETPHIX. iScience 2020; 23:101619. [PMID: 33089107 PMCID: PMC7566085 DOI: 10.1016/j.isci.2020.101619] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 09/08/2020] [Accepted: 09/24/2020] [Indexed: 12/29/2022] Open
Abstract
Phenotypic heterogeneity in cancer is often caused by different patterns of genetic alterations. Understanding such phenotype-genotype relationships is fundamental for the advance of personalized medicine. We develop a computational method, named NETPHIX (NETwork-to-PHenotype association with eXclusivity) to identify subnetworks of genes whose genetic alterations are associated with drug response or other continuous cancer phenotypes. Leveraging interaction information among genes and properties of cancer mutations such as mutual exclusivity, we formulate the problem as an integer linear program and solve it optimally to obtain a subnetwork of associated genes. Applied to a large-scale drug screening dataset, NETPHIX uncovered gene modules significantly associated with drug responses. Utilizing interaction information, NETPHIX modules are functionally coherent and can thus provide important insights into drug action. In addition, we show that modules identified by NETPHIX together with their association patterns can be leveraged to suggest drug combinations.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| | - Rebecca Sarto Basso
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA 94709, USA
| | - Damian Wojtowicz
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| | - Amanda S Liu
- Montgomery Blair High School, Silver Spring, MD 20901, USA
| | - Dorit S Hochbaum
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA 94709, USA
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, Padova 35131, Italy
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| |
Collapse
|
19
|
Lee W, Huang DS, Han K. Constructing cancer patient-specific and group-specific gene networks with multi-omics data. BMC Med Genomics 2020; 13:81. [PMID: 32854705 PMCID: PMC7450550 DOI: 10.1186/s12920-020-00736-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 06/05/2020] [Indexed: 12/26/2022] Open
Abstract
Background Cancer is a complex and heterogeneous disease with many possible genetic and environmental causes. The same treatment for patients of the same cancer type often results in different outcomes in terms of efficacy and side effects of the treatment. Thus, the molecular characterization of individual cancer patients is increasingly important to find an effective treatment. Recently a few methods have been developed to construct cancer sample-specific gene networks based on the difference in the mRNA expression levels between the cancer sample and reference samples. Methods We constructed a patient-specific network with multi-omics data based on the difference between a reference network and a perturbed reference network by the patient. A network specific to a group of patients was obtained using the average change in correlation coefficients and node degree of patient-specific networks of the group. Results In this paper, we present a new method for constructing cancer patient-specific and group-specific gene networks with multi-omics data. The main differences of our method from previous ones are as follows: (1) networks are constructed with multi-omics (mRNA expression, copy number variation, DNA methylation and microRNA expression) data rather than with mRNA expression data alone, (2) background networks are constructed with both normal samples and cancer samples of the specified type to extract cancer-specific gene correlations, and (3) both patient individual-specific networks and patient group-specific networks can be constructed. The results of evaluating our method with several types of cancer show that it constructs more informative and accurate gene networks than previous methods. Conclusions The results of evaluating our method with extensive data of seven cancer types show that the difference of gene correlations between the reference samples and a patient sample is a more predictive feature than mRNA expression levels and that gene networks constructed with multi-omics data show a better performance than those with single omics data in predicting cancer for most cancer types. Our approach will be useful for finding genes and gene pairs to tailor treatments to individual characteristics.
Collapse
Affiliation(s)
- Wook Lee
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China
| | - Kyungsook Han
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea.
| |
Collapse
|
20
|
NPF:network propagation for protein function prediction. BMC Bioinformatics 2020; 21:355. [PMID: 32787776 PMCID: PMC7430911 DOI: 10.1186/s12859-020-03663-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 07/14/2020] [Indexed: 11/29/2022] Open
Abstract
Background The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, treating disease and developing new medicines. Various methods have been developed to facilitate the prediction of these functions by combining protein interaction networks (PINs) with multi-omics data. However, it is still challenging to make full use of multiple biological to improve the performance of functions annotation. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. According to the comprehensive evaluation of NPF, it delivered a better performance than other competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions We demonstrated that network propagation, together with multi-omics data, can both discover more partners with similar function, and is unconstricted by the “small-world” feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional information of similarity from protein correlations.
Collapse
|
21
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
22
|
Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet 2020; 11:106. [PMID: 32180795 PMCID: PMC7057719 DOI: 10.3389/fgene.2020.00106] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/29/2020] [Indexed: 02/01/2023] Open
Abstract
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion—also referred to as network propagation—has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network diffusion provides a quantitative estimation of network proximity between genes associated with one or more different data types, from simple binary vectors to real vectors. Therefore, this powerful data transformation method has also been increasingly used in integrative analyses of multiple collections of biological scores and/or one or more interaction networks. We present an overview of the state of the art of bioinformatics pipelines that use network diffusion processes for the integrative analysis of omics data. We discuss the fundamental ways in which network diffusion is exploited, open issues and potential developments in the field. Current trends suggest that network diffusion is a tool of broad utility in omics data analysis. It is reasonable to think that it will continue to be used and further refined as new data types arise (e.g. single cell datasets) and the identification of system-level patterns will be considered more and more important in omics data analysis.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.,Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy
| | - Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.,National Institute of Nuclear Physics (INFN), Bologna, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| |
Collapse
|
23
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
24
|
Cowman T, Coşkun M, Grama A, Koyutürk M. Integrated querying and version control of context-specific biological networks. Database (Oxford) 2020; 2020:baaa018. [PMID: 32294194 PMCID: PMC7158887 DOI: 10.1093/database/baaa018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 01/13/2020] [Accepted: 02/21/2020] [Indexed: 01/26/2023]
Abstract
MOTIVATION Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT tyler.cowman@case.edu.
Collapse
Affiliation(s)
- Tyler Cowman
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mustafa Coşkun
- Department of Computer Engineering, Abdullah Gül University, Kayseri 38080, Turkey
| | - Ananth Grama
- Department of Computer Science, Purdue University, West Lafayette, IN 47906, USA
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
25
|
Morais-Rodrigues F, Silv Erio-Machado R, Kato RB, Rodrigues DLN, Valdez-Baez J, Fonseca V, San EJ, Gomes LGR, Dos Santos RG, Vinicius Canário Viana M, da Cruz Ferraz Dutra J, Teixeira Dornelles Parise M, Parise D, Campos FF, de Souza SJ, Ortega JM, Barh D, Ghosh P, Azevedo VAC, Dos Santos MA. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene 2019; 726:144168. [PMID: 31759986 DOI: 10.1016/j.gene.2019.144168] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 09/21/2019] [Accepted: 10/11/2019] [Indexed: 01/02/2023]
Abstract
Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the 'curse of dimensionality' many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
Collapse
Affiliation(s)
- Francielly Morais-Rodrigues
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil.
| | - Rita Silv Erio-Machado
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Rodrigo Bentes Kato
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Diego Lucas Neres Rodrigues
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Juan Valdez-Baez
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Vagner Fonseca
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil; KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban 4001, South Africa
| | - Emmanuel James San
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban 4001, South Africa
| | - Lucas Gabriel Rodrigues Gomes
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Roselane Gonçalves Dos Santos
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Marcus Vinicius Canário Viana
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil; Federal University of Pará, UFPA, Brazil
| | - Joyce da Cruz Ferraz Dutra
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Mariana Teixeira Dornelles Parise
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Doglas Parise
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Frederico F Campos
- Department of Computer Science, Federal University of Minas Gerais, Brazil Av Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | | | - José Miguel Ortega
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, West Bengal 721172, India
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Vasco A C Azevedo
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Marcos A Dos Santos
- Department of Computer Science, Federal University of Minas Gerais, Brazil Av Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
26
|
On Complex Network Construction of Rain Gauge Stations Considering Nonlinearity of Observed Daily Rainfall Data. WATER 2019. [DOI: 10.3390/w11081578] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Rainfall data is frequently used as input and analysis data in the field of hydrology. To obtain adequate rainfall data, there should be a rain gauge network that can cover the relevant region. Therefore, it is necessary to analyze and evaluate the adequacy of rain gauge networks. Currently, a complex network analysis is frequently used in network analysis and in the hydrology field, Pearson correlation is used as strength of link in constructing networks. However, Pearson correlation is used for analyzing the linear relationship of data. Therefore, it is now suitable for nonlinear hydrological data (such as rainfall and runoff). Thus, a possible solution to this problem is to apply mutual information that can consider nonlinearity of data. The present study used a method of statistical analysis known as the Brock–Dechert–Scheinkman (BDS) statistics to test the nonlinearity of rainfall data from 55 Automated Synoptic Observing System (ASOS) rain gauge stations in South Korea. Analysis results indicated that all rain gauge stations showed nonlinearity in the data. Complex networks of these rain gauge stations were constructed by applying Pearson correlation and mutual information. Then, they were compared by computing their centrality values. Comparing the centrality rankings according to different thresholds for correlation showed that the network based on mutual information yielded consistent results in the rankings, whereas the network, which based on Pearson correlation exhibited much variability in the results. Thus, it was found that using mutual information is appropriate when constructing a complex network utilizing rainfall data with nonlinear characteristics.
Collapse
|
27
|
Jiang J, Xing F, Zeng X, Zou Q. Investigating Maize Yield-Related Genes in Multiple Omics Interaction Network Data. IEEE Trans Nanobioscience 2019; 19:142-151. [PMID: 31170079 DOI: 10.1109/tnb.2019.2920419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Zea mays (maize) is the highest yielding food crop globally, feeding large numbers of people across the planet. It is thus especially important to explore the key genes that affect maize production with prior knowledge. Merging multiple datasets of different types can improve the accuracy of candidate genes prediction results, so we constructed interaction networks using gene, mRNA, protein, and expression profile datasets. A network propagation schedule was used considering combined scores obtained by integrating both network scores and significance scores for each candidate gene based on the guilt-by-association principle. An SVM model was used to optimize the weighted parameters to achieve more reliable results, according to the accuracy of label classification. We found that integrating multiple omics data with more data types improves the reliability of the results. We investigated the GO terms particularly associated with the top 100 candidate genes and the known genes, and analyzed the roles that these genes play in determining the phenotype of maize. We hope that the candidate genes identified here will provide a biological perspective and contribute to maize breeding research.
Collapse
|
28
|
Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, Kolas N, O’Donnell L, Leung G, McAdam R, Zhang F, Dolma S, Willems A, Coulombe-Huntington J, Chatr-aryamontri A, Dolinski K, Tyers M. The BioGRID interaction database: 2019 update. Nucleic Acids Res 2019; 47:D529-D541. [PMID: 30476227 PMCID: PMC6324058 DOI: 10.1093/nar/gky1079] [Citation(s) in RCA: 947] [Impact Index Per Article: 157.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 10/15/2018] [Accepted: 11/22/2018] [Indexed: 12/17/2022] Open
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.
Collapse
Affiliation(s)
- Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Chris Stark
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Bobby-Joe Breitkreutz
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Jennifer Rust
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Lorrie Boucher
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Christie Chang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Nadine Kolas
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Lara O’Donnell
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Genie Leung
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Rochelle McAdam
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Frederick Zhang
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Sonam Dolma
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Andrew Willems
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Jasmin Coulombe-Huntington
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| | - Andrew Chatr-aryamontri
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Mike Tyers
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| |
Collapse
|
29
|
Duran‐Frigola M, Fernández‐Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1408] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Miquel Duran‐Frigola
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Adrià Fernández‐Torras
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Martino Bertoni
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Patrick Aloy
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) Barcelona Spain
| |
Collapse
|
30
|
Typing tumors using pathways selected by somatic evolution. Nat Commun 2018; 9:4159. [PMID: 30297789 PMCID: PMC6175900 DOI: 10.1038/s41467-018-06464-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 09/03/2018] [Indexed: 01/01/2023] Open
Abstract
Many recent efforts to analyze cancer genomes involve aggregation of mutations within reference maps of molecular pathways and protein networks. Here, we find these pathway studies are impeded by molecular interactions that are functionally irrelevant to cancer or the patient’s tumor type, as these interactions diminish the contrast of driver pathways relative to individual frequently mutated genes. This problem can be addressed by creating stringent tumor-specific networks of biophysical protein interactions, identified by signatures of epistatic selection during tumor evolution. Using such an evolutionarily selected pathway (ESP) map, we analyze the major cancer genome atlases to derive a hierarchical classification of tumor subtypes linked to characteristic mutated pathways. These pathways are clinically prognostic and predictive, including the TP53-AXIN-ARHGEF17 combination in liver and CYLC2-STK11-STK11IP in lung cancer, which we validate in independent cohorts. This ESP framework substantially improves the definition of cancer pathways and subtypes from tumor genome data. Informative pathways driving cancer pathogenesis and subtypes can be difficult to identify in the presence of many gene interactions irrelevant to cancer. Here, the authors describe an approach for cancer gene pathway analysis based on key molecular interactions that drive cancer in relevant tissue types, and they assemble a focused map of Evolutionarily Selected Pathways (ESP) with interactions supported by both protein–protein binding and genetic epistasis during somatic tumor evolution.
Collapse
|