1
|
Taheri G, Habibi M. Uncovering driver genes in breast cancer through an innovative machine learning mutational analysis method. Comput Biol Med 2024; 171:108234. [PMID: 38430742 DOI: 10.1016/j.compbiomed.2024.108234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/25/2024] [Accepted: 02/25/2024] [Indexed: 03/05/2024]
Abstract
Breast cancer has become a severe public health concern and one of the leading causes of cancer-related death in women worldwide. Several genes and mutations in these genes linked to breast cancer have been identified using sophisticated techniques, despite the fact that the exact cause of breast cancer is still unknown. A commonly used feature for identifying driver mutations is the recurrence of a mutation in patients. Nevertheless, some mutations are more likely to occur than others for various reasons. Sequencing analysis has shown that cancer-driving genes operate across complex networks, often with mutations appearing in a modular pattern. In this work, as a retrospective study, we used TCGA data, which is gathered from breast cancer patients. We introduced a new machine-learning approach to examine gene functionality in networks derived from mutation associations, gene-gene interactions, and graph clustering for breast cancer analysis. These networks have uncovered crucial biological components in critical pathways, particularly those that exhibit low-frequency mutations. The statistical strength of the clinical study is significantly boosted by evaluating the network as a whole instead of just single gene effects. Our method successfully identified essential driver genes with diverse mutation frequencies. We then explored the functions of these potential driver genes and their related pathways. By uncovering low-frequency genes, we shed light on understudied pathways associated with breast cancer. Additionally, we present a novel Monte Carlo-based algorithm to identify driver modules in breast cancer. Our findings highlight the significance and role of these modules in critical signaling pathways in breast cancer, providing a comprehensive understanding of breast cancer development. Materials and implementations are available at: [https://github.com/MahnazHabibi/BreastCancer].
Collapse
Affiliation(s)
- Golnaz Taheri
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.
| | - Mahnaz Habibi
- Department of Mathematics, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| |
Collapse
|
2
|
Pukhta IR, Rout RK. Identification and segregation of genes with improved recurrent neural network trained with optimal gene level and mutation level features. Comput Methods Biomech Biomed Engin 2024:1-16. [PMID: 38424698 DOI: 10.1080/10255842.2024.2311322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/20/2024] [Indexed: 03/02/2024]
Abstract
Even though many different approaches have been employed to address the complex mutational heterogeneity of cancer, finding driver genes is still problematic since other genomic factors cannot be fully integrated for combined analyses. This research paper presents a novel gene identification and segregation model with five key processes (a) pre-processing, (b) treatment of class imbalances, (c) feature extraction, (d) feature selection, and (e) gene classification. To increase the data quality, the gathered initial information is first pre-processed utilizing data cleaning and data normalization. This turns the raw data into something that is both useful and effective. In actuality, the sample is skewed against drivers because passenger mutation markers appear in proportionally less instances than drivers do. To address the Class Imbalance Problem, improved K-Means + SMOTE are applied to the preprocessed data. The most crucial characteristics, including those at the gene and mutation levels, are then extracted from the balanced dataset. To lessen the computational load in terms of time, the best features from the retrieved features are selected using Forensic interpretation tailored hunger food search optimization (FIHFSO). The ideal features are used to train the deep learning classifier that conducts the separation procedure. In this research, an Improved Recurrent Neural Network (I-RNN) is used to make a final decision about genes. At 90% of learning percentage, the accuracy of the proposed method achieves 0.98% of 0.83, 0.81, 0.65, 0.80, 0.92 and 0.63% which is compared to the other methods like HGS, FBIO, AOA, AO, GOA and PRO respectively.
Collapse
Affiliation(s)
- Irfan Rashid Pukhta
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| | - Ranjeet Kumar Rout
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| |
Collapse
|
3
|
Wu J, Nie Q, Li G, Zhu K. Identifying driver pathways based on a parameter-free model and a partheno-genetic algorithm. BMC Bioinformatics 2023; 24:211. [PMID: 37221474 DOI: 10.1186/s12859-023-05319-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 05/04/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Tremendous amounts of omics data accumulated have made it possible to identify cancer driver pathways through computational methods, which is believed to be able to offer critical information in such downstream research as ascertaining cancer pathogenesis, developing anti-cancer drugs, and so on. It is a challenging problem to identify cancer driver pathways by integrating multiple omics data. RESULTS In this study, a parameter-free identification model SMCMN, incorporating both pathway features and gene associations in Protein-Protein Interaction (PPI) network, is proposed. A novel measurement of mutual exclusivity is devised to exclude some gene sets with "inclusion" relationship. By introducing gene clustering based operators, a partheno-genetic algorithm CPGA is put forward for solving the SMCMN model. Experiments were implemented on three real cancer datasets to compare the identification performance of models and methods. The comparisons of models demonstrate that the SMCMN model does eliminate the "inclusion" relationship, and produces gene sets with better enrichment performance compared with the classical model MWSM in most cases. CONCLUSIONS The gene sets recognized by the proposed CPGA-SMCMN method possess more genes engaging in known cancer related pathways, as well as stronger connectivity in PPI network. All of which have been demonstrated through extensive contrast experiments among the CPGA-SMCMN method and six state-of-the-art ones.
Collapse
Affiliation(s)
- Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China.
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China.
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China.
| | - Qinghua Nie
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Kai Zhu
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| |
Collapse
|
4
|
Habibi M, Taheri G. A new machine learning method for cancer mutation analysis. PLoS Comput Biol 2022; 18:e1010332. [PMID: 36251702 PMCID: PMC9612828 DOI: 10.1371/journal.pcbi.1010332] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/27/2022] [Accepted: 10/05/2022] [Indexed: 11/23/2022] Open
Abstract
It is complicated to identify cancer-causing mutations. The recurrence of a mutation in patients remains one of the most reliable features of mutation driver status. However, some mutations are more likely to happen than others for various reasons. Different sequencing analysis has revealed that cancer driver genes operate across complex pathways and networks, with mutations often arising in a mutually exclusive pattern. Genes with low-frequency mutations are understudied as cancer-related genes, especially in the context of networks. Here we propose a machine learning method to study the functionality of mutually exclusive genes in the networks derived from mutation associations, gene-gene interactions, and graph clustering. These networks have indicated critical biological components in the essential pathways, especially those mutated at low frequency. Studying the network and not just the impact of a single gene significantly increases the statistical power of clinical analysis. The proposed method identified important driver genes with different frequencies. We studied the function and the associated pathways in which the candidate driver genes participate. By introducing lower-frequency genes, we recognized less studied cancer-related pathways. We also proposed a novel clustering method to specify driver modules. We evaluated each driver module with different criteria, including the terms of biological processes and the number of simultaneous mutations in each cancer. Materials and implementations are available at: https://github.com/MahnazHabibi/MutationAnalysis.
Collapse
Affiliation(s)
- Mahnaz Habibi
- Department of Mathematics, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Golnaz Taheri
- Department of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
5
|
Wang C, Zhang H, Ma H, Wang Y, Cai K, Guo T, Yang Y, Li Z, Zhu Y. Inference of pan-cancer related genes by orthologs matching based on enhanced LSTM model. Front Microbiol 2022; 13:963704. [PMID: 36267181 PMCID: PMC9577021 DOI: 10.3389/fmicb.2022.963704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
Many disease-related genes have been found to be associated with cancer diagnosis, which is useful for understanding the pathophysiology of cancer, generating targeted drugs, and developing new diagnostic and treatment techniques. With the development of the pan-cancer project and the ongoing expansion of sequencing technology, many scientists are focusing on mining common genes from The Cancer Genome Atlas (TCGA) across various cancer types. In this study, we attempted to infer pan-cancer associated genes by examining the microbial model organism Saccharomyces Cerevisiae (Yeast) by homology matching, which was motivated by the benefits of reverse genetics. First, a background network of protein-protein interactions and a pathogenic gene set involving several cancer types in humans and yeast were created. The homology between the human gene and yeast gene was then discovered by homology matching, and its interaction sub-network was obtained. This was undertaken following the principle that the homologous genes of the common ancestor may have similarities in expression. Then, using bidirectional long short-term memory (BiLSTM) in combination with adaptive integration of heterogeneous information, we further explored the topological characteristics of the yeast protein interaction network and presented a node representation score to evaluate the node ability in graphs. Finally, homologous mapping for human genes matched the important genes identified by ensemble classifiers for yeast, which may be thought of as genes connected to all types of cancer. One way to assess the performance of the BiLSTM model is through experiments on the database. On the other hand, enrichment analysis, survival analysis, and other outcomes can be used to confirm the biological importance of the prediction results. You may access the whole experimental protocols and programs at https://github.com/zhuyuan-cug/AI-BiLSTM/tree/master.
Collapse
Affiliation(s)
- Chao Wang
- Department of Surgery, Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Houwang Zhang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Haishu Ma
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Yawen Wang
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Ke Cai
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Tingrui Guo
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Zhen Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Wuhan, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Shanghai, China
- *Correspondence: Yuan Zhu
| |
Collapse
|
6
|
A nonlinear model and an algorithm for identifying cancer driver pathways. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
7
|
Wu J, Wu C, Li G. Identifying common driver modules by equilibrating coverage and mutual exclusivity across pan-cancer data. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
8
|
Zhu Y, Zhang H, Yang Y, Zhang C, Ou-Yang L, Bai L, Deng M, Yi M, Liu S, Wang C. Discovery of pan-cancer related genes via integrative network analysis. Brief Funct Genomics 2022; 21:325-338. [PMID: 35760070 DOI: 10.1093/bfgp/elac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/14/2022] [Accepted: 05/25/2022] [Indexed: 01/02/2023] Open
Abstract
Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method's potential for application in identifying driver gene candidates for further biological experimental verification.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Fudan University), Ministry of Education, Handan Road, 200433, Shanghai, China
| | - Houwang Zhang
- Electrical Engineering, City University of HongKong, Kowloon, 999077, HongKong, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, USA
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Avenue, 518060, Shenzhen, China
| | - Litai Bai
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, No.5 Yiheyuan Road, 100871, Beijing, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Song Liu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Chao Wang
- Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Department of Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue, 430030, Wuhan, China
| |
Collapse
|
9
|
Winkler S, Winkler I, Figaschewski M, Tiede T, Nordheim A, Kohlbacher O. De novo identification of maximally deregulated subnetworks based on multi-omics data with DeRegNet. BMC Bioinformatics 2022; 23:139. [PMID: 35439941 PMCID: PMC9020058 DOI: 10.1186/s12859-022-04670-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open
Abstract
Background With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological process under study. De novo identification of these pathways can reduce the bias inherent in predefined pathways or gene sets. At the same time, the definition and efficient identification of these pathways de novo from large biological networks is a challenging problem. Results We present a novel algorithm, DeRegNet, for the identification of maximally deregulated subnetworks on directed graphs based on deregulation scores derived from (multi-)omics data. DeRegNet can be interpreted as maximum likelihood estimation given a certain probabilistic model for de-novo subgraph identification. We use fractional integer programming to solve the resulting combinatorial optimization problem. We can show that the approach outperforms related algorithms on simulated data with known ground truths. On a publicly available liver cancer dataset we can show that DeRegNet can identify biologically meaningful subgraphs suitable for patient stratification. DeRegNet can also be used to find explicitly multi-omics subgraphs which we demonstrate by presenting subgraphs with consistent methylation-transcription patterns. DeRegNet is freely available as open-source software. Conclusion The proposed algorithmic framework and its available implementation can serve as a valuable heuristic hypothesis generation tool contextualizing omics data within biomolecular networks.
Collapse
Affiliation(s)
- Sebastian Winkler
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany. .,International Max Planck Research School (IMPRS) "From Molecules to Organism", Tübingen, Germany.
| | - Ivana Winkler
- International Max Planck Research School (IMPRS) "From Molecules to Organism", Tübingen, Germany.,Interfaculty Institute for Cell Biology (IFIZ), University of Tuebingen, Tübingen, Germany.,German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mirjam Figaschewski
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany
| | - Thorsten Tiede
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany
| | - Alfred Nordheim
- Interfaculty Institute for Cell Biology (IFIZ), University of Tuebingen, Tübingen, Germany.,Leibniz Institute on Aging (FLI), Jena, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tuebingen, Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tübingen, Germany.,Translational Bioinformatics, University Hospital Tuebingen, Tübingen, Germany
| |
Collapse
|
10
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
11
|
gcMECM: graph clustering of mutual exclusivity of cancer mutations. BMC Bioinformatics 2021; 22:592. [PMID: 34906079 PMCID: PMC8670134 DOI: 10.1186/s12859-021-04505-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 11/30/2021] [Indexed: 11/29/2022] Open
Abstract
Background Next-generation sequencing platforms allow us to sequence millions of small fragments of DNA simultaneously, revolutionizing cancer research. Sequence analysis has revealed that cancer driver genes operate across multiple intricate pathways and networks with mutations often occurring in a mutually exclusive pattern. Currently, low-frequency mutations are understudied as cancer-relevant genes, especially in the context of networks. Results Here we describe a tool, gcMECM, that enables us to visualize the functionality of mutually exclusive genes in the subnetworks derived from mutation associations, gene–gene interactions, and graph clustering. These subnetworks have revealed crucial biological components in the canonical pathway, especially those mutated at low frequency. Examining the subnetwork, and not just the impact of a single gene, significantly increases the statistical power of clinical analysis and enables us to build models to better predict how and why cancer develops. Conclusions gcMECM uses a computationally efficient and scalable algorithm to identify subnetworks in a canonical pathway with mutually exclusive mutation patterns and distinct biological functions.
Collapse
|
12
|
Ahmed R, Erten C, Houdjedj A, Kazan H, Yalcin C. A Network-Centric Framework for the Evaluation of Mutual Exclusivity Tests on Cancer Drivers. Front Genet 2021; 12:746495. [PMID: 34899838 PMCID: PMC8664367 DOI: 10.3389/fgene.2021.746495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 10/27/2021] [Indexed: 12/03/2022] Open
Abstract
One of the key concepts employed in cancer driver gene identification is that of mutual exclusivity (ME); a driver mutation is less likely to occur in case of an earlier mutation that has common functionality in the same molecular pathway. Several ME tests have been proposed recently, however the current protocols to evaluate ME tests have two main limitations. Firstly the evaluations are mostly with respect to simulated data and secondly the evaluation metrics lack a network-centric view. The latter is especially crucial as the notion of common functionality can be achieved through searching for interaction patterns in relevant networks. We propose a network-centric framework to evaluate the pairwise significances found by statistical ME tests. It has three main components. The first component consists of metrics employed in the network-centric ME evaluations. Such metrics are designed so that network knowledge and the reference set of known cancer genes are incorporated in ME evaluations under a careful definition of proper control groups. The other two components are designed as further mechanisms to avoid confounders inherent in ME detection on top of the network-centric view. To this end, our second objective is to dissect the side effects caused by mutation load artifacts where mutations driving tumor subtypes with low mutation load might be incorrectly diagnosed as mutually exclusive. Finally, as part of the third main component, the confounding issue stemming from the use of nonspecific interaction networks generated as combinations of interactions from different tissues is resolved through the creation and use of tissue-specific networks in the proposed framework. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/NetCentric.
Collapse
Affiliation(s)
- Rafsan Ahmed
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Cansu Yalcin
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| |
Collapse
|
13
|
Gao B, Zhao Y, Gao Y, Li G, Wu L. Identification of Common Driver Gene Modules and Associations between Cancers through Integrated Network Analysis. GLOBAL CHALLENGES (HOBOKEN, NJ) 2021; 5:2100006. [PMID: 34504716 PMCID: PMC8414517 DOI: 10.1002/gch2.202100006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/26/2021] [Indexed: 05/12/2023]
Abstract
High-throughput biological data has created an unprecedented opportunity for illuminating the mechanisms of tumor emergence and evolution. An important and challenging problem in deciphering cancers is to investigate the commonalities of driver genes and pathways and the associations between cancers. Aiming at this problem, a tool ComCovEx is developed to identify common cancer driver gene modules between two cancers by searching for the candidates in local signaling networks using an exclusivity-coverage iteration strategy and outputting those with significant coverage and exclusivity for both cancers. The associations of the cancer pairs are further evaluated by Fisher's exact test. Being applied to 11 TCGA cancer datasets, ComCovEx identifies 13 significantly associated cancer pairs with plenty of biologically significant common gene modules. The novel results of cancer relationship and common gene modules reveal the relevant pathological basis of different cancer types and provide new clues to diagnosis and drug treatment in associated cancers.
Collapse
Affiliation(s)
- Bo Gao
- IAMMADISNCMISAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of MathematicsShandong UniversityJinan250100China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
- School of Public HealthCapital Medical UniversityBeijing100069China
- Beijing Municipal Key Laboratory of Clinical EpidemiologyBeijing100069China
| | - Yue Zhao
- IAMMADISNCMISAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Yonghang Gao
- IAMMADISNCMISAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Guojun Li
- School of MathematicsShandong UniversityJinan250100China
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| | - Ling‐Yun Wu
- IAMMADISNCMISAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| |
Collapse
|
14
|
Yang Z, Yu G, Guo M, Yu J, Zhang X, Wang J. CDPath: Cooperative Driver Pathways Discovery Using Integer Linear Programming and Markov Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1384-1395. [PMID: 31581094 DOI: 10.1109/tcbb.2019.2945029] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Discovering driver pathways is an essential task to understand the pathogenesis of cancer and to design precise treatments for cancer patients. Increasing evidences have been indicating that multiple pathways often function cooperatively in carcinogenesis. In this study, we propose an approach called CDPath to discover cooperative driver pathways. CDPath first uses Integer Linear Programming to explore driver core modules from mutation profiles by enforcing co-occurrence and functional interaction relations between modules, and by maximizing the mutual exclusivity and coverage within modules. Next, to enforce cooperation of pathways and help the follow-up exact cooperative driver pathways discovery, it performs Markov clustering on pathway-pathway interaction network to cluster pathways. After that, it identifies pathways in different modules but in the same clusters as cooperative driver pathways. We apply CDPath on two TCGA datasets: breast cancer (BRCA) and endometrial cancer (UCEC). The results show that CDPath can identify known (i.e., TP53) and potential driver genes (i.e., SPTBN2). In addition, the identified cooperative driver pathways are related with the target cancer, and they are involved with carcinogenesis and several key biological processes. CDPath can uncover more potential biological associations between pathways (over 100 percent) and more cooperative driver pathways (over 200 percent) than competitive approaches. The demo codes of CDPath are available at http://mlda.swu.edu.cn/codes.php?name=CDPath.
Collapse
|
15
|
Erten C, Houdjedj A, Kazan H. Ranking cancer drivers via betweenness-based outlier detection and random walks. BMC Bioinformatics 2021; 22:62. [PMID: 33568049 PMCID: PMC7877041 DOI: 10.1186/s12859-021-03989-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 01/31/2021] [Indexed: 12/04/2022] Open
Abstract
Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
16
|
Ahmed R, Baali I, Erten C, Hoxha E, Kazan H. MEXCOwalk: mutual exclusion and coverage based random walk to identify cancer modules. Bioinformatics 2020; 36:872-879. [PMID: 31432076 DOI: 10.1093/bioinformatics/btz655] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 07/03/2019] [Accepted: 08/18/2019] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction (PPI) networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. RESULTS We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions (PPIs), mutual exclusivity and coverage to identify cancer driver modules. MEXCOwalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rafsan Ahmed
- Electrical and Computer Engineering Graduate Program, Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Ilyes Baali
- Electrical and Computer Engineering Graduate Program, Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Evis Hoxha
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya 07190, Turkey
| |
Collapse
|
17
|
Cutigi JF, Evangelista AF, Simao A. Approaches for the identification of driver mutations in cancer: A tutorial from a computational perspective. J Bioinform Comput Biol 2020; 18:2050016. [PMID: 32698724 DOI: 10.1142/s021972002050016x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cancer is a complex disease caused by the accumulation of genetic alterations during the individual's life. Such alterations are called genetic mutations and can be divided into two groups: (1) Passenger mutations, which are not responsible for cancer and (2) Driver mutations, which are significant for cancer and responsible for its initiation and progression. Cancer cells undergo a large number of mutations, of which most are passengers, and few are drivers. The identification of driver mutations is a key point and one of the biggest challenges in Cancer Genomics. Many computational methods for such a purpose have been developed in Cancer Bioinformatics. Such computational methods are complex and are usually described in a high level of abstraction. This tutorial details some classical computational methods, from a computational perspective, with the transcription in an algorithmic format towards an easy access by researchers.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of São Paulo (IFSP), São Carlos, SP, Brazil.,University of São Paulo (USP), São Carlos, SP, Brazil
| | | | | |
Collapse
|
18
|
Song J, Peng W, Wang F. An Entropy-Based Method for Identifying Mutual Exclusive Driver Genes in Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:758-768. [PMID: 30763245 DOI: 10.1109/tcbb.2019.2897931] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cancer in essence is a complex genomic alteration disease which is caused by the somatic mutations during the lifetime. According to previous researches, the first step to overcome cancer is to identify driver genes which can promote carcinogenesis. However, it is still a big challenge to precisely and efficiently extract the cancer related driver genes because the nature of cancer is heterogeneous and there exists tremendously irrelevant passenger mutations which have no function impact on the cancer's development. In this work, we proposed a novel entropy-based method namely EntroRank to identify driver genes by integrating the subcellular localization information and mutual exclusive of variation frequency into the network. EntroRank can take into full consideration different properties of driver genes. Considering the modularity of driver genes, the mutated genes in the network were first clustered into different subgroups according to their located compartments. After that, the structural entropy of the gene in the subgroup was employed to measure its indispensability. Considering mutual exclusive property between driver genes in the modules, relative entropy was utilized to measure the degree of mutual exclusive between two mutated genes in terms of their variation frequency. We applied our method to three different cancers including lung, prostate, and breast cancer. The results show our method not only detect the well-known important drivers but also prioritiz the rare unknown driver genes. Besides, EntroRank can identify driver genes having mutual exclusive property. Compared with other existing methods, our method achieves a better performance for most of cancer types in terms of Precision, Recall, and Fscore.
Collapse
|
19
|
Wei PJ, Wu FX, Xia J, Su Y, Wang J, Zheng CH. Prioritizing Cancer Genes Based on an Improved Random Walk Method. Front Genet 2020; 11:377. [PMID: 32411180 PMCID: PMC7198854 DOI: 10.3389/fgene.2020.00377] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/26/2020] [Indexed: 12/18/2022] Open
Abstract
Identifying driver genes that contribute to cancer progression from numerous passenger genes, although a central goal, is a major challenge. The protein-protein interaction network provides convenient and reasonable assistance for driver gene discovery. Random walk-based methods have been widely used to prioritize nodes in social or biological networks. However, most studies select the next arriving node uniformly from the random walker's neighbors. Few consider transiting preference according to the degree of random walker's neighbors. In this study, based on the random walk method, we propose a novel approach named Driver_IRW (Driver genes discovery with Improved Random Walk method), to prioritize cancer genes in cancer-related network. The key idea of Driver_IRW is to assign different transition probabilities for different edges of a constructed cancer-related network in accordance with the degree of the nodes' neighbors. Furthermore, the global centrality (here is betweenness centrality) and Katz feedback centrality are incorporated into the framework to evaluate the probability to walk to the seed nodes. Experimental results on four cancer types indicate that Driver_IRW performs more efficiently than some previously published methods for uncovering known cancer-related genes. In conclusion, our method can aid in prioritizing cancer-related genes and complement traditional frequency and network-based methods.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Jing Wang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
- College of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
20
|
Zhang W, Zeng Y, Wang L, Liu Y, Cheng YN. An Effective Graph Clustering Method to Identify Cancer Driver Modules. Front Bioeng Biotechnol 2020; 8:271. [PMID: 32318558 PMCID: PMC7154174 DOI: 10.3389/fbioe.2020.00271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 03/16/2020] [Indexed: 12/15/2022] Open
Abstract
Identifying the molecular modules that drive cancer progression can greatly deepen the understanding of cancer mechanisms and provide useful information for targeted therapies. Most methods currently addressing this issue primarily use mutual exclusivity without making full use of the extra layer of module property. In this paper, we propose MCLCluster to identity cancer driver modules, which use somatic mutation data, Cancer Cell Fraction (CCF) data, gene functional interaction network and protein-protein interaction (PPI) network to derive the module property on mutual exclusivity, connectivity in PPI network and functionally similarity of genes. We have taken three effective measures to ensure the effectiveness of our algorithm. First, we use CCF data to choose stronger signals and more confident mutations. Second, the weighted gene functional interaction network is used to quantify the gene functional similarity in PPI. The third, graph clustering method based on Markov is exploited to extract the candidate module. MCLCluster is tested in the two TCGA datasets (GBM and BRCA), and identifies several well-known oncogenes driver modules and some modules with functionally associated driver genes. Besides, we compare it with Multi-Dendrix, FSME Cluster and RME in simulated dataset with background noise and passenger rate, MCLCluster outperforming all of these methods.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Yifu Zeng
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Yue Liu
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, China
| | - Yi-Nan Cheng
- College of Science, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
21
|
Li F, Gao L, Wang B. Detection of Driver Modules with Rarely Mutated Genes in Cancers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:390-401. [PMID: 29994261 DOI: 10.1109/tcbb.2018.2846262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identifying driver modules or pathways is a key challenge to interpret the molecular mechanisms and pathogenesis underlying cancer. An increasing number of studies suggest that rarely mutated genes are important for the development of cancer. However, the driver modules consisting of mutated genes with low-frequency driver mutations are not well characterized. To identify driver modules with rarely mutated genes, we propose a functional similarity index to quantify the functional relationship between rarely mutated genes and other ones in the same module. Then, we develop a method to detect Driver Modules with Rarely mutated Genes (DMRG) by incorporating the functional similarity, coverage and mutual exclusivity. By applying DMRG on TCGA cancer dataset on three networks: HINT+HI2012, iRefIndex and MultiNet, we detect driver modules intersecting with the well-known signalling pathways and protein complexes, such as the cell cycle pathway and the mediator complex. DMRG can also detect driver modules effectively with 20, 40, 60 and 80 percent of samples by random selection. When compared with HotNet2, DMRG detects more rarely mutated cancer genes and has higher pathway enrichment. Overall, DMRG provides an effective method for the identification of driver modules with rarely mutated genes.
Collapse
|
22
|
Hui Y, Wei PJ, Xia J, Wang YT, Zheng CH. MECoRank: cancer driver genes discovery simultaneously evaluating the impact of SNVs and differential expression on transcriptional networks. BMC Med Genomics 2019; 12:140. [PMID: 31888623 PMCID: PMC6936061 DOI: 10.1186/s12920-019-0582-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 09/10/2019] [Indexed: 01/09/2023] Open
Abstract
Background Although there are huge volumes of genomic data, how to decipher them and identify driver events is still a challenge. The current methods based on network typically use the relationship between genomic events and consequent changes in gene expression to nominate putative driver genes. But there may exist some relationships within the transcriptional network. Methods We developed MECoRank, a novel method that improves the recognition accuracy of driver genes. MECoRank is based on bipartite graph to propagates the scores via an iterative process. After iteration, we will obtain a ranked gene list for each patient sample. Then, we applied the Condorcet voting method to determine the most impactful drivers in a population. Results We applied MECoRank to three cancer datasets to reveal candidate driver genes which have a greater impact on gene expression. Experimental results show that our method not only can identify more driver genes that have been validated than other methods, but also can recognize some impactful novel genes which have been proved to be more important in literature. Conclusions We propose a novel approach named MECoRank to prioritize driver genes based on their impact on the expression in the molecular interaction network. This method not only assesses mutation’s effect on the transcriptional network, but also assesses the differential expression’s effect within the transcriptional network. And the results demonstrated that MECoRank has better performance than the other competing approaches in identifying driver genes.
Collapse
Affiliation(s)
- Ying Hui
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Pi-Jing Wei
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Junfeng Xia
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Yu-Tian Wang
- School of Software Engineering, Qufu Normal University, Qufu, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China.
| |
Collapse
|
23
|
Identifying Mutually Exclusive Gene Sets with Prognostic Value and Novel Potential Driver Genes in Patients with Glioblastoma. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4860367. [PMID: 31815141 PMCID: PMC6878817 DOI: 10.1155/2019/4860367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 06/15/2019] [Accepted: 10/01/2019] [Indexed: 12/12/2022]
Abstract
The pathogenesis and prognosis of glioblastoma (GBM) remain poorly understood. Mutual exclusivity analysis can distinguish driver genes and pathways from passenger ones. The purpose of this study was to identify mutually exclusive gene sets (MEGSs) that have prognostic value and to detect novel driver genes in GBM. The genomic alteration profile and clinical information were derived from The Cancer Genome Atlas, and the MEGSA method was used to identify the MEGS. Next, we performed survival analysis and constructed a risk prediction model for prognostic stratification. Leave-one-out cross-validation and permutation test were used to evaluate its performance. Finally, we identified 21 statistically significant MEGSs. We found that the MEGS in the RB pathway was significantly associated with poor prognosis, after adjusting for age and gender (HR = 1.837, 95% CI: 1.192-2.831). Based on the risk prediction model, 208 (80.9%) and 49 (19.1%) patients were assigned to high- and low-risk groups, respectively (log-rank: p < 0.001, adjusted p=0.001). Additionally, we found that SPTA1, a novel gene involved in the MEGS, was mutually exclusive with members of cell cycle, P53, and RB pathways. In conclusion, the MEGS in the RB pathway had considerable clinical value for GBM prognostic stratification. Mutated SPTA1 may be involved in GBM development.
Collapse
|
24
|
Song J, Peng W, Wang F. A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph. BMC Bioinformatics 2019; 20:238. [PMID: 31088372 PMCID: PMC6518800 DOI: 10.1186/s12859-019-2847-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 04/24/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Cancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task. RESULTS Considering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types. CONCLUSIONS The final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment. AVAILABILITY The source code can be obtained at https://github.com/weiba/Subdyquency .
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| |
Collapse
|
25
|
Adaptively Weighted and Robust Mathematical Programming for the Discovery of Driver Gene Sets in Cancers. Sci Rep 2019; 9:5959. [PMID: 30976053 PMCID: PMC6459865 DOI: 10.1038/s41598-019-42500-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 03/28/2019] [Indexed: 12/14/2022] Open
Abstract
High coverage and mutual exclusivity (HCME), which are considered two combinatorial properties of mutations in a collection of driver genes in cancers, have been used to develop mathematical programming models for distinguishing cancer driver gene sets. In this paper, we summarize a weak HCME pattern to justify the description of practical mutation datasets. We then present AWRMP, a method for identifying driver gene sets through the adaptive assignment of appropriate weights to gene candidates to tune the balance between coverage and mutual exclusivity. It embeds the genetic algorithm into the subsampling strategy to provide the optimization results robust against the uncertainty and noise in the data. Using biological datasets, we show that AWRMP can identify driver gene sets that satisfy the weak HCME pattern and outperform the state-of-arts methods in terms of robustness.
Collapse
|
26
|
Wu J, Cai Q, Wang J, Liao Y. Identifying mutated driver pathways in cancer by integrating multi-omics data. Comput Biol Chem 2019; 80:159-167. [PMID: 30959272 DOI: 10.1016/j.compbiolchem.2019.03.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 03/23/2019] [Indexed: 10/27/2022]
Abstract
Since the driver pathway in cancer plays a crucial role in the formation and progression of cancer, it is very imperative to identify driver pathways, which will offer important information for precision medicine or personalized medicine. In this paper, an improved maximum weight submatrix problem model is proposed by integrating such three kinds of omics data as somatic mutations, copy number variations, and gene expressions. The model tries to adjust coverage and mutual exclusivity with the average weight of genes in a pathway, and simultaneously considers the correlation among genes, so that the pathway having high coverage but moderate mutual exclusivity can be identified. By introducing a kind of short chromosome code and a greedy based recombination operator, a parthenogenetic algorithm PGA-MWS is presented to solve the model. Experimental comparisons among algorithms GA, MOGA, iMCMC and PGA-MWS were performed on biological and simulated data sets. The experimental results show that, compared with the other three algorithms, the PGA-MWS one based on the improved model can identify the gene sets with high coverage but moderate mutual exclusivity and scales well. Many of the identified gene sets are involved in known signaling pathways, most of the implicated genes are oncogenes or tumor suppressors previously reported in literatures. The experimental results indicate that the proposed approach may become a useful complementary tool for detecting cancer pathways.
Collapse
Affiliation(s)
- Jingli Wu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China; College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
| | - Qirong Cai
- College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
| | - Jinyan Wang
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China; College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
| | - Yuanxiu Liao
- College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
| |
Collapse
|
27
|
Zhang W, Wang SL. An Integrated Framework for Identifying Mutated Driver Pathway and Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:455-464. [PMID: 29990286 DOI: 10.1109/tcbb.2017.2788016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Next-generation sequencing (NGS) technologies provide amount of somatic mutation data in a large number of patients. The identification of mutated driver pathway and cancer progression from these data is a challenging task because of the heterogeneity of interpatient. In addition, cancer progression at the pathway level has been proved to be more reasonable than at the gene level. In this paper, we introduce an integrated framework to identify mutated driver pathways and cancer progression (iMDPCP) at the pathway level from somatic mutation data. First, we use uncertainty coefficient to quantify mutual exclusivity on gene driver pathways and develop a computational framework to identify mutated driver pathways based on the adaptive discrete differential evolution algorithm. Then, we construct cancer progression model for driver pathways based on the Bayesian Network. Finally, we evaluate the performance of iMDPCP on real cancer somatic mutation datasets. The experimental results indicate that iMDPCP is more accurate than state-of-the-art methods according to the enrichment of KEGG pathways, and it also provides new insights on identifying cancer progression at the pathway level.
Collapse
|
28
|
Gao B, Zhao Y, Li Y, Liu J, Wang L, Li G, Su Z. Prediction of Driver Modules via Balancing Exclusive Coverages of Mutations in Cancer Samples. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2019; 6:1801384. [PMID: 30828525 PMCID: PMC6382311 DOI: 10.1002/advs.201801384] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 10/04/2018] [Indexed: 05/07/2023]
Abstract
Mutual exclusivity of cancer driving mutations is a frequently observed phenomenon in the mutational landscape of cancer. The long tail of rare mutations complicates the discovery of mutually exclusive driver modules. The existing methods usually suffer from the problem that only few genes in some identified modules cover most of the cancer samples. To overcome this hurdle, an efficient method UniCovEx is presented via identifying mutually exclusive driver modules of balanced exclusive coverages. UniCovEx first searches for candidate driver modules with a strong topological relationship in signaling networks using a greedy strategy. It then evaluates the candidate modules by considering their coverage, exclusivity, and balance of coverage, using a novel metric termed exclusive entropy of modules, which measures how balanced the modules are. Finally, UniCovEx predicts sample-specific driver modules by solving a minimum set cover problem using a greedy strategy. When tested on 12 The Cancer Genome Atlas datasets of different cancer types, UniCovEx shows a significant superiority over the previous methods. The software is available at: https://sourceforge.net/projects/cancer-pathway/files/.
Collapse
Affiliation(s)
- Bo Gao
- School of MathematicsShandong UniversityJinan250100China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100China
| | - Yue Zhao
- IAMMADISNCMISAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
| | - Yang Li
- School of MathematicsShandong UniversityJinan250100China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100China
| | - Juntao Liu
- School of MathematicsShandong UniversityJinan250100China
| | - Lushan Wang
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100China
| | - Guojun Li
- School of MathematicsShandong UniversityJinan250100China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100China
| | - Zhengchang Su
- Department of Bioinformatics and GenomicsCollege of Computing and InformaticsThe University of North Carolina at Charlotte9201 University City BlvdCharlotteNC28223USA
| |
Collapse
|
29
|
Identifying Cancer Specific Driver Modules Using a Network-Based Method. Molecules 2018; 23:molecules23051114. [PMID: 29738475 PMCID: PMC6100049 DOI: 10.3390/molecules23051114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 04/26/2018] [Accepted: 05/07/2018] [Indexed: 02/01/2023] Open
Abstract
Detecting driver modules is a key challenge for understanding the mechanisms of carcinogenesis at the pathway level. Identifying cancer specific driver modules is helpful for interpreting the different principles of different cancer types. However, most methods are proposed to identify driver modules in one cancer, but few methods are introduced to detect cancer specific driver modules. We propose a network-based method to detect cancer specific driver modules (CSDM) in a certain cancer type to other cancer types. We construct the specific network of a cancer by combining specific coverage and mutual exclusivity in all cancer types, to catch the specificity of the cancer at the pathway level. To illustrate the performance of the method, we apply CSDM on 12 TCGA cancer types. When we compare CSDM with SpeMDP and HotNet2 with regard to specific coverage and the enrichment of GO terms and KEGG pathways, CSDM is more accurate. We find that the specific driver modules of two different cancers have little overlap, which indicates that the driver modules detected by CSDM are specific. Finally, we also analyze three specific driver modules of BRCA, BLCA, and LAML intersecting with well-known pathways. The source code of CSDM is freely accessible at https://github.com/fengli28/CSDM.git.
Collapse
|
30
|
Gao B, Li G, Liu J, Li Y, Huang X. Identification of driver modules in pan-cancer via coordinating coverage and exclusivity. Oncotarget 2018; 8:36115-36126. [PMID: 28415609 PMCID: PMC5482642 DOI: 10.18632/oncotarget.16433] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 03/13/2017] [Indexed: 12/30/2022] Open
Abstract
It is widely accepted that cancer is driven by accumulated somatic mutations during the lifetime of an individual. Cancer mutations may target relatively small number of cell functional modules. The heterogeneity in different cancer patients makes it difficult to identify driver mutations or functional modules related to cancer. It is biologically desired to be capable of identifying cancer pathway modules through coordination between coverage and exclusivity. There have been a few approaches developed for this purpose, but they all have limitations in practice due to their computational complexity and prediction accuracy. We present a network based approach, CovEx, to predict the specific patient oriented modules by 1) discovering candidate modules for each considered gene, 2) extracting significant candidates by harmonizing coverage and exclusivity and, 3) further selecting the patient oriented modules based on a set cover model. Applying CovEx to pan-cancer datasets spanning 12 cancer types collecting from public database TCGA, it demonstrates significant superiority over the current leading competitors in performance. It is published under GNU GENERAL PUBLIC LICENSE and the source code is available at:https://sourceforge.net/projects/cancer-pathway/files/
Collapse
Affiliation(s)
- Bo Gao
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.,Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.,Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| | - Juntao Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Yang Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA.,Molecular Biosciences Program, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| |
Collapse
|
31
|
Zhang J, Zhang S. Discovery of cancer common and specific driver gene sets. Nucleic Acids Res 2017; 45:e86. [PMID: 28168295 PMCID: PMC5449640 DOI: 10.1093/nar/gkx089] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 01/20/2017] [Accepted: 01/31/2017] [Indexed: 12/31/2022] Open
Abstract
Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found.
Collapse
Affiliation(s)
- Junhua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematics Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|