1
|
A nonlinear model and an algorithm for identifying cancer driver pathways. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
2
|
A model and algorithm for identifying driver pathways based on weighted non-binary mutation matrix. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02330-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractIt is generally acknowledged that driver pathway plays a decisive role in the occurrence and progress of tumors, and the identification of driver pathways has become imperative for precision medicine or personalized medicine. Due to the inevitable sequencing error, the noise contained in single omics cancer data usually plays a negative effect on identification. It is a feasible approach to take advantage of multi-omics cancer data rather than a single one now that large amounts of multi-omics cancer data have become available. The identification of driver pathways by integrating multi-omics cancer data has attracted attention of researchers in bioinformatics recently. In this paper, a weighted non-binary mutation matrix is constructed by integrating copy number variations, somatic mutations and gene expressions. Based on the weighted non-binary mutation matrix, a new identification model is proposed through defining new measurements of coverage and exclusivity. Then, a cooperative coevolutionary algorithm CGA-MWS is put forward for solving the presented model. Both real cancer data and simulated one were used to conduct comparisons among methods Dendrix, GA, iMCMC, MOGA, PGA-MWS and CGA-MWS. Compared with the pathways identified by the other five methods, more genes, belonging to the pathway identified by the CGA-MWS method, are enriched in a known signaling pathway in most cases. Simultaneously, the high efficiency of method CGA-MWS makes it practical in realistic applications. All of which have been verified through a number of experiments.
Collapse
|
3
|
ChiPPI: a novel method for mapping chimeric protein-protein interactions uncovers selection principles of protein fusion events in cancer. Nucleic Acids Res 2017; 45:7094-7105. [PMID: 28549153 PMCID: PMC5499553 DOI: 10.1093/nar/gkx423] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 05/07/2017] [Indexed: 12/20/2022] Open
Abstract
Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by chromosomal aberrations. The expressed fusion protein incorporates domains of both parental proteins. Using a methodology that treats discrete protein domains as binding sites for specific domains of interacting proteins, we have cataloged the protein interaction networks for 11 528 cancer fusions (ChiTaRS-3.1). Here, we present our novel method, chimeric protein–protein interactions (ChiPPI) that uses the domain–domain co-occurrence scores in order to identify preserved interactors of chimeric proteins. Mapping the influence of fusion proteins on cell metabolism and pathways reveals that ChiPPI networks often lose tumor suppressor proteins and gain oncoproteins. Furthermore, fusions often induce novel connections between non-interactors skewing interaction networks and signaling pathways. We compared fusion protein PPI networks in leukemia/lymphoma, sarcoma and solid tumors finding distinct enrichment patterns for each disease type. While certain pathways are enriched in all three diseases (Wnt, Notch and TGF β), there are distinct patterns for leukemia (EGFR signaling, DNA replication and CCKR signaling), for sarcoma (p53 pathway and CCKR signaling) and solid tumors (FGFR and EGFR signaling). Thus, the ChiPPI method represents a comprehensive tool for studying the anomaly of skewed cellular networks produced by fusion proteins in cancer.
Collapse
|
4
|
A NOVEL AND EFFICIENT ALGORITHM FOR DE NOVO DISCOVERY OF MUTATED DRIVER PATHWAYS IN CANCER. Ann Appl Stat 2017; 11:1481-1512. [PMID: 29479394 DOI: 10.1214/17-aoas1042] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Next-generation sequencing studies on cancer somatic mutations have discovered that driver mutations tend to appear in most tumor samples, but they barely overlap in any single tumor sample, presumably because a single driver mutation can perturb the whole pathway. Based on the corresponding new concepts of coverage and mutual exclusivity, new methods can be designed for de novo discovery of mutated driver pathways in cancer. Since the computational problem is a combinatorial optimization with an objective function involving a discontinuous indicator function in high dimension, many existing optimization algorithms, such as a brute force enumeration, gradient descent and Newton's methods, are practically infeasible or directly inapplicable. We develop a new algorithm based on a novel formulation of the problem as non-convex programming and non-convex regularization. The method is computationally more efficient, effective and scalable than existing Monte Carlo searching and several other algorithms, which have been applied to The Cancer Genome Atlas (TCGA) project. We also extend the new method for integrative analysis of both mutation and gene expression data. We demonstrate the promising performance of the new methods with applications to three cancer datasets to discover de novo mutated driver pathways.
Collapse
|
5
|
Discovery of cancer common and specific driver gene sets. Nucleic Acids Res 2017; 45:e86. [PMID: 28168295 PMCID: PMC5449640 DOI: 10.1093/nar/gkx089] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 01/20/2017] [Accepted: 01/31/2017] [Indexed: 12/31/2022] Open
Abstract
Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found.
Collapse
|
6
|
|
7
|
Identification of mutated driver pathways in cancer using a multi-objective optimization model. Comput Biol Med 2016; 72:22-9. [PMID: 26995027 DOI: 10.1016/j.compbiomed.2016.03.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 03/04/2016] [Accepted: 03/04/2016] [Indexed: 11/17/2022]
Abstract
New-generation high-throughput technologies, including next-generation sequencing technology, have been extensively applied to solve biological problems. As a result, large cancer genomics projects such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium are producing large amount of rich and diverse data in multiple cancer types. The identification of mutated driver genes and driver pathways from these data is a significant challenge. Genome aberrations in cancer cells can be divided into two types: random 'passenger mutation' and functional 'driver mutation'. In this paper, we introduced a Multi-objective Optimization model based on a Genetic Algorithm (MOGA) to solve the maximum weight submatrix problem, which can be employed to identify driver genes and driver pathways promoting cancer proliferation. The maximum weight submatrix problem defined to find mutated driver pathways is based on two specific properties, i.e., high coverage and high exclusivity. The multi-objective optimization model can adjust the trade-off between high coverage and high exclusivity. We proposed an integrative model by combining gene expression data and mutation data to improve the performance of the MOGA algorithm in a biological context.
Collapse
|
8
|
|
9
|
Discovery of co-occurring driver pathways in cancer. BMC Bioinformatics 2014; 15:271. [PMID: 25106096 PMCID: PMC4133618 DOI: 10.1186/1471-2105-15-271] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 08/01/2014] [Indexed: 01/08/2023] Open
Abstract
Background It has been widely realized that pathways rather than individual genes govern the course of carcinogenesis. Therefore, discovering driver pathways is becoming an important step to understand the molecular mechanisms underlying cancer and design efficient treatments for cancer patients. Previous studies have focused mainly on observation of the alterations in cancer genomes at the individual gene or single pathway level. However, a great deal of evidence has indicated that multiple pathways often function cooperatively in carcinogenesis and other key biological processes. Results In this study, an exact mathematical programming method was proposed to de novo identify co-occurring mutated driver pathways (CoMDP) in carcinogenesis without any prior information beyond mutation profiles. Two possible properties of mutations that occurred in cooperative pathways were exploited to achieve this: (1) each individual pathway has high coverage and high exclusivity; and (2) the mutations between the pair of pathways showed statistically significant co-occurrence. The efficiency of CoMDP was validated first by testing on simulated data and comparing it with a previous method. Then CoMDP was applied to several real biological data including glioblastoma, lung adenocarcinoma, and ovarian carcinoma datasets. The discovered co-occurring driver pathways were here found to be involved in several key biological processes, such as cell survival and protein synthesis. Moreover, CoMDP was modified to (1) identify an extra pathway co-occurring with a known pathway and (2) detect multiple significant co-occurring driver pathways for carcinogenesis. Conclusions The present method can be used to identify gene sets with more biological relevance than the ones currently used for the discovery of single driver pathways. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-271) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Simulated annealing based algorithm for identifying mutated driver pathways in cancer. BIOMED RESEARCH INTERNATIONAL 2014; 2014:375980. [PMID: 24982873 PMCID: PMC4058194 DOI: 10.1155/2014/375980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 05/13/2014] [Indexed: 12/17/2022]
Abstract
With the development of next-generation DNA sequencing technologies, large-scale cancer genomics projects can be implemented to help researchers to identify driver genes, driver mutations, and driver pathways, which promote cancer proliferation in large numbers of cancer patients. Hence, one of the remaining challenges is to distinguish functional mutations vital for cancer development, and filter out the unfunctional and random "passenger mutations." In this study, we introduce a modified method to solve the so-called maximum weight submatrix problem which is used to identify mutated driver pathways in cancer. The problem is based on two combinatorial properties, that is, coverage and exclusivity. Particularly, we enhance an integrative model which combines gene mutation and expression data. The experimental results on simulated data show that, compared with the other methods, our method is more efficient. Finally, we apply the proposed method on two real biological datasets. The results show that our proposed method is also applicable in real practice.
Collapse
|
11
|
Abstract
MOTIVATION At the core of transcriptome analyses of cancer is a challenge to detect molecular differences affiliated with disease phenotypes. This approach has led to remarkable progress in identifying molecular signatures and in stratifying patients into clinical groups. Yet, despite this progress, many of the identified signatures are not robust enough to be clinically used and not consistent enough to provide a follow-up on molecular mechanisms. RESULTS To address these issues, we introduce PhenoNet, a novel algorithm for the identification of pathways and networks associated with different phenotypes. PhenoNet uses two types of input data: gene expression data (RMA, RPKM, FPKM, etc.) and phenotypic information, and integrates these data with curated pathways and protein-protein interaction information. Comprehensive iterations across all possible pathways and subnetworks result in the identification of key pathways or subnetworks that distinguish between the two phenotypes. AVAILABILITY AND IMPLEMENTATION Matlab code is available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
Identification of mutated core cancer modules by integrating somatic mutation, copy number variation, and gene expression data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 2:S4. [PMID: 24565034 PMCID: PMC3851989 DOI: 10.1186/1752-0509-7-s2-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Understanding the molecular mechanisms underlying cancer is an important step for the effective diagnosis and treatment of cancer patients. With the huge volume of data from the large-scale cancer genomics projects, an open challenge is to distinguish driver mutations, pathways, and gene sets (or core modules) that contribute to cancer formation and progression from random passengers which accumulate in somatic cells but do not contribute to tumorigenesis. Due to mutational heterogeneity, current analyses are often restricted to known pathways and functional modules for enrichment of somatic mutations. Therefore, discovery of new pathways and functional modules is a pressing need. RESULTS In this study, we propose a novel method to identify Mutated Core Modules in Cancer (iMCMC) without any prior information other than cancer genomic data from patients with tumors. This is a network-based approach in which three kinds of data are integrated: somatic mutations, copy number variations (CNVs), and gene expressions. Firstly, the first two datasets are merged to obtain a mutation matrix, based on which a weighted mutation network is constructed where the vertex weight corresponds to gene coverage and the edge weight corresponds to the mutual exclusivity between gene pairs. Similarly, a weighted expression network is generated from the expression matrix where the vertex and edge weights correspond to the influence of a gene mutation on other genes and the Pearson correlation of gene mutation-correlated expressions, respectively. Then an integrative network is obtained by further combining these two networks, and the most coherent subnetworks are identified by using an optimization model. Finally, we obtained the core modules for tumors by filtering with significance and exclusivity tests. We applied iMCMC to the Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and ovarian carcinoma data, and identified several mutated core modules, some of which are involved in known pathways. Most of the implicated genes are oncogenes or tumor suppressors previously reported to be related to carcinogenesis. As a comparison, we also performed iMCMC on two of the three kinds of data, i.e., the datasets combining somatic mutations with CNVs and secondly the datasets combining somatic mutations with gene expressions. The results indicate that gene expressions or CNVs indeed provide extra useful information to the original data for the identification of core modules in cancer. CONCLUSIONS This study demonstrates the utility of our iMCMC by integrating multiple data sources to identify mutated core modules in cancer. In addition to presenting a generally applicable methodology, our findings provide several candidate pathways or core modules recurrently perturbed in GBM or ovarian carcinoma for further studies.
Collapse
|
13
|
Abstract
MOTIVATION The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones. RESULTS In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers. AVAILABILITY A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang. CONTACT zsh@amss.ac.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
14
|
Finding driver pathways in cancer: models and algorithms. Algorithms Mol Biol 2012; 7:23. [PMID: 22954134 PMCID: PMC3544164 DOI: 10.1186/1748-7188-7-23] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 07/26/2012] [Indexed: 12/20/2022] Open
Abstract
Background Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. Results We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. Conclusions Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.
Collapse
|
15
|
Biomarker robustness reveals the PDGF network as driving disease outcome in ovarian cancer patients in multiple studies. BMC SYSTEMS BIOLOGY 2012; 6:3. [PMID: 22236809 PMCID: PMC3298526 DOI: 10.1186/1752-0509-6-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2011] [Accepted: 01/11/2012] [Indexed: 12/27/2022]
Abstract
Background Ovarian cancer causes more deaths than any other gynecological cancer. Identifying the molecular mechanisms that drive disease progress in ovarian cancer is a critical step in providing therapeutics, improving diagnostics, and affiliating clinical behavior with disease etiology. Identification of molecular interactions that stratify prognosis is key in facilitating a clinical-molecular perspective. Results The Cancer Genome Atlas has recently made available the molecular characteristics of more than 500 patients. We used the TCGA multi-analysis study, and two additional datasets and a set of computational algorithms that we developed. The computational algorithms are based on methods that identify network alterations and quantify network behavior through gene expression. We identify a network biomarker that significantly stratifies survival rates in ovarian cancer patients. Interestingly, expression levels of single or sets of genes do not explain the prognostic stratification. The discovered biomarker is composed of the network around the PDGF pathway. The biomarker enables prognosis stratification. Conclusion The work presented here demonstrates, through the power of gene-expression networks, the criticality of the PDGF network in driving disease course. In uncovering the specific interactions within the network, that drive the phenotype, we catalyze targeted treatment, facilitate prognosis and offer a novel perspective into hidden disease heterogeneity.
Collapse
|
16
|
Abstract
Next-generation DNA sequencing technologies are enabling genome-wide measurements of somatic mutations in large numbers of cancer patients. A major challenge in the interpretation of these data is to distinguish functional "driver mutations" important for cancer development from random "passenger mutations." A common approach for identifying driver mutations is to find genes that are mutated at significant frequency in a large cohort of cancer genomes. This approach is confounded by the observation that driver mutations target multiple cellular signaling and regulatory pathways. Thus, each cancer patient may exhibit a different combination of mutations that are sufficient to perturb these pathways. This mutational heterogeneity presents a problem for predicting driver mutations solely from their frequency of occurrence. We introduce two combinatorial properties, coverage and exclusivity, that distinguish driver pathways, or groups of genes containing driver mutations, from groups of genes with passenger mutations. We derive two algorithms, called Dendrix, to find driver pathways de novo from somatic mutation data. We apply Dendrix to analyze somatic mutation data from 623 genes in 188 lung adenocarcinoma patients, 601 genes in 84 glioblastoma patients, and 238 known mutations in 1000 patients with various cancers. In all data sets, we find groups of genes that are mutated in large subsets of patients and whose mutations are approximately exclusive. Our Dendrix algorithms scale to whole-genome analysis of thousands of patients and thus will prove useful for larger data sets to come from The Cancer Genome Atlas (TCGA) and other large-scale cancer genome sequencing projects.
Collapse
|
17
|
PathScan: a tool for discerning mutational significance in groups of putative cancer genes. ACTA ACUST UNITED AC 2011; 27:1595-602. [PMID: 21498403 DOI: 10.1093/bioinformatics/btr193] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The expansion of cancer genome sequencing continues to stimulate development of analytical tools for inferring relationships between somatic changes and tumor development. Pathway associations are especially consequential, but existing algorithms are demonstrably inadequate. METHODS Here, we propose the PathScan significance test for the scenario where pathway mutations collectively contribute to tumor development. Its design addresses two aspects that established methods neglect. First, we account for variations in gene length and the consequent differences in their mutation probabilities under the standard null hypothesis of random mutation. The associated spike in computational effort is mitigated by accurate convolution-based approximation. Second, we combine individual probabilities into a multiple-sample value using Fisher-Lancaster theory, thereby improving differentiation between a few highly mutated genes and many genes having only a few mutations apiece. We investigate accuracy, computational effort and power, reporting acceptable performance for each. RESULTS As an example calculation, we re-analyze KEGG-based lung adenocarcinoma pathway mutations from the Tumor Sequencing Project. Our test recapitulates the most significant pathways and finds that others for which the original test battery was inconclusive are not actually significant. It also identifies the focal adhesion pathway as being significantly mutated, a finding consistent with earlier studies. We also expand this analysis to other databases: Reactome, BioCarta, Pfam, PID and SMART, finding additional hits in ErbB and EPHA signaling pathways and regulation of telomerase. All have implications and plausible mechanistic roles in cancer. Finally, we discuss aspects of extending the method to integrate gene-specific background rates and other types of genetic anomalies. AVAILABILITY PathScan is implemented in Perl and is available from the Genome Institute at: http://genome.wustl.edu/software/pathscan.
Collapse
|