1
|
Liu P, Page D, Ahlquist P, Ong IM, Gitter A. MPAC: a computational framework for inferring pathway activities from multi-omic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.15.599113. [PMID: 38948762 PMCID: PMC11212914 DOI: 10.1101/2024.06.15.599113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Fully capturing cellular state requires examining genomic, epigenomic, transcriptomic, proteomic, and other assays for a biological sample and comprehensive computational modeling to reason with the complex and sometimes conflicting measurements. Modeling these so-called multi-omic data is especially beneficial in disease analysis, where observations across omic data types may reveal unexpected patient groupings and inform clinical outcomes and treatments. We present Multi-omic Pathway Analysis of Cells (MPAC), a computational framework that interprets multi-omic data through prior knowledge from biological pathways. MPAC uses network relationships encoded in pathways using a factor graph to infer consensus activity levels for proteins and associated pathway entities from multi-omic data, runs permutation testing to eliminate spurious activity predictions, and groups biological samples by pathway activities to prioritize proteins with potential clinical relevance. Using DNA copy number alteration and RNA-seq data from head and neck squamous cell carcinoma patients from The Cancer Genome Atlas as an example, we demonstrate that MPAC predicts a patient subgroup related to immune responses not identified by analysis with either input omic data type alone. Key proteins identified via this subgroup have pathway activities related to clinical outcome as well as immune cell compositions. Our MPAC R package, available at https://bioconductor.org/packages/MPAC, enables similar multi-omic analyses on new datasets.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Paul Ahlquist
- John and Jeanne Rowe Center for Research in Virology, Morgridge Institute for Research, Madison, Wisconsin, United States of America
- McArdle Laboratory for Cancer Research, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Institute for Molecular Virology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Center for Human Genomics and Precision Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- John and Jeanne Rowe Center for Research in Virology, Morgridge Institute for Research, Madison, Wisconsin, United States of America
| |
Collapse
|
2
|
Geraghty S, Boyer JA, Fazel-Zarandi M, Arzouni N, Ryseck RP, McBride MJ, Parsons LR, Rabinowitz JD, Singh M. Integrative Computational Framework, Dyscovr, Links Mutated Driver Genes to Expression Dysregulation Across 19 Cancer Types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.20.624509. [PMID: 39605479 PMCID: PMC11601522 DOI: 10.1101/2024.11.20.624509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Though somatic mutations play a critical role in driving cancer initiation and progression, the systems-level functional impacts of these mutations-particularly, how they alter expression across the genome and give rise to cancer hallmarks-are not yet well-understood, even for well-studied cancer driver genes. To address this, we designed an integrative machine learning model, Dyscovr, that leverages mutation, gene expression, copy number alteration (CNA), methylation, and clinical data to uncover putative relationships between nonsynonymous mutations in key cancer driver genes and transcriptional changes across the genome. We applied Dyscovr pan-cancer and within 19 individual cancer types, finding both broadly relevant and cancer type-specific links between driver genes and putative targets, including a subset we further identify as exhibiting negative genetic relationships. Our work newly implicates-and validates in cell lines-KBTBD2 and mutant PIK3CA as putative synthetic lethals in breast cancer, suggesting a novel combinatorial treatment approach.
Collapse
Affiliation(s)
- Sara Geraghty
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Jacob A. Boyer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ 08554
| | - Mahya Fazel-Zarandi
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Nibal Arzouni
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Rolf-Peter Ryseck
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Matthew J. McBride
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854
| | - Lance R. Parsons
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Joshua D. Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ 08554
- Department of Chemistry, Princeton University, Princeton, NJ 08544
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Department of Computer Science, Princeton University, Princeton, NJ 08544
- Lead Contact
| |
Collapse
|
3
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
4
|
Huang M, Ma J, An G, Ye X. Unravelling cancer subtype-specific driver genes in single-cell transcriptomics data with CSDGI. PLoS Comput Biol 2023; 19:e1011450. [PMID: 38096269 PMCID: PMC10754467 DOI: 10.1371/journal.pcbi.1011450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 12/28/2023] [Accepted: 12/05/2023] [Indexed: 12/29/2023] Open
Abstract
Cancer is known as a heterogeneous disease. Cancer driver genes (CDGs) need to be inferred for understanding tumor heterogeneity in cancer. However, the existing computational methods have identified many common CDGs. A key challenge exploring cancer progression is to infer cancer subtype-specific driver genes (CSDGs), which provides guidane for the diagnosis, treatment and prognosis of cancer. The significant advancements in single-cell RNA-sequencing (scRNA-seq) technologies have opened up new possibilities for studying human cancers at the individual cell level. In this study, we develop a novel unsupervised method, CSDGI (Cancer Subtype-specific Driver Gene Inference), which applies Encoder-Decoder-Framework consisting of low-rank residual neural networks to inferring driver genes corresponding to potential cancer subtypes at the single-cell level. To infer CSDGs, we apply CSDGI to the tumor single-cell transcriptomics data. To filter the redundant genes before driver gene inference, we perform the differential expression genes (DEGs). The experimental results demonstrate CSDGI is effective to infer driver genes that are cancer subtype-specific. Functional and disease enrichment analysis shows these inferred CSDGs indicate the key biological processes and disease pathways. CSDGI is the first method to explore cancer driver genes at the cancer subtype level. We believe that it can be a useful method to understand the mechanisms of cell transformation driving tumours.
Collapse
Affiliation(s)
- Meng Huang
- Department of Automation, Xiamen University, Xiamen, China
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Jiangtao Ma
- Department of Automation, Xiamen University, Xiamen, China
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Guangqi An
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
5
|
Pavel AB, Garrison C, Luo L, Liu G, Taub D, Xiao J, Juan-Guardela B, Tedrow J, Alekseyev YO, Yang IV, Geraci MW, Sciurba F, Schwartz DA, Kaminski N, Beane J, Spira A, Lenburg ME, Campbell JD. Integrative genetic and genomic networks identify microRNA associated with COPD and ILD. Sci Rep 2023; 13:13076. [PMID: 37567908 PMCID: PMC10421936 DOI: 10.1038/s41598-023-39751-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 07/30/2023] [Indexed: 08/13/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) are clinically and molecularly heterogeneous diseases. We utilized clustering and integrative network analyses to elucidate roles for microRNAs (miRNAs) and miRNA isoforms (isomiRs) in COPD and ILD pathogenesis. Short RNA sequencing was performed on 351 lung tissue samples of COPD (n = 145), ILD (n = 144) and controls (n = 64). Five distinct subclusters of samples were identified including 1 COPD-predominant cluster and 2 ILD-predominant clusters which associated with different clinical measurements of disease severity. Utilizing 262 samples with gene expression and SNP microarrays, we built disease-specific genetic and expression networks to predict key miRNA regulators of gene expression. Members of miR-449/34 family, known to promote airway differentiation by repressing the Notch pathway, were among the top connected miRNAs in both COPD and ILD networks. Genes associated with miR-449/34 members in the disease networks were enriched among genes that increase in expression with airway differentiation at an air-liquid interface. A highly expressed isomiR containing a novel seed sequence was identified at the miR-34c-5p locus. 47% of the anticorrelated predicted targets for this isomiR were distinct from the canonical seed sequence for miR-34c-5p. Overexpression of the canonical miR-34c-5p and the miR-34c-5p isomiR with an alternative seed sequence down-regulated NOTCH1 and NOTCH4. However, only overexpression of the isomiR down-regulated genes involved in Ras signaling such as CRKL and GRB2. Overall, these findings elucidate molecular heterogeneity inherent across COPD and ILD patients and further suggest roles for miR-34c in regulating disease-associated gene-expression.
Collapse
Affiliation(s)
- Ana B Pavel
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA.
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA.
| | - Carly Garrison
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Lingqi Luo
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Gang Liu
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Daniel Taub
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Ji Xiao
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
| | - Brenda Juan-Guardela
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - John Tedrow
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
- Norman Regional Medical Center, Norman, Oklahoma, USA
| | - Yuriy O Alekseyev
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Ivana V Yang
- Department of Medicine, University of Colorado, Aurora, CO, USA
| | - Mark W Geraci
- Department of Medicine, University of Colorado, Aurora, CO, USA
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Frank Sciurba
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - David A Schwartz
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Naftali Kaminski
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
- Department of Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jennifer Beane
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
| | - Avrum Spira
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
| | - Marc E Lenburg
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Joshua D Campbell
- Department of Medicine, Boston University School of Medicine, 72 East Concord St, Boston, MA, 02118, USA.
- Bioinformatics Graduate Program, Boston University, Boston, MA, USA.
| |
Collapse
|
6
|
Tao Y, Ma X, Palmer D, Schwartz R, Lu X, Osmanbeyoglu H. Interpretable deep learning for chromatin-informed inference of transcriptional programs driven by somatic alterations across cancers. Nucleic Acids Res 2022; 50:10869-10881. [PMID: 36243974 PMCID: PMC9638905 DOI: 10.1093/nar/gkac881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 09/23/2022] [Accepted: 09/29/2022] [Indexed: 11/14/2022] Open
Abstract
Cancer is a disease of gene dysregulation, where cells acquire somatic and epigenetic alterations that drive aberrant cellular signaling. These alterations adversely impact transcriptional programs and cause profound changes in gene expression. Interpreting somatic alterations within context-specific transcriptional programs will facilitate personalized therapeutic decisions but is a monumental task. Toward this goal, we develop a partially interpretable neural network model called Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism (CITRUS). CITRUS models the impact of somatic alterations on transcription factors and downstream transcriptional programs. Our approach employs a self-attention mechanism to model the contextual impact of somatic alterations. Furthermore, CITRUS uses a layer of hidden nodes to explicitly represent the state of transcription factors (TFs) to learn the relationships between TFs and their target genes based on TF binding motifs in the open chromatin regions of tumor samples. We apply CITRUS to genomic, transcriptomic, and epigenomic data from 17 cancer types profiled by The Cancer Genome Atlas. CITRUS predicts patient-specific TF activities and reveals transcriptional program variations between and within tumor types. We show that CITRUS yields biological insights into delineating TFs associated with somatic alterations in individual tumors. Thus, CITRUS is a promising tool for precision oncology.
Collapse
Affiliation(s)
- Yifeng Tao
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaojun Ma
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Drake Palmer
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Pharmaceutical Science, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hatice Ulku Osmanbeyoglu
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Bioengineering, School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
7
|
Wu Q, Wang L, Tsui SKW. Mutational signatures representative transcriptomic perturbations in hepatocellular carcinoma. Front Genet 2022; 13:970907. [PMID: 36081995 PMCID: PMC9445436 DOI: 10.3389/fgene.2022.970907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 07/27/2022] [Indexed: 11/17/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a primary malignancy with increasing incidence and poor prognosis. Heterogeneity originating from genomic instability is one of the critical reasons of poor outcomes. However, the studies of underlying mechanisms and pathways affected by mutations are still not intelligible. Currently, integrative molecular-level studies using multiomics approaches enable comprehensive analysis for cancers, which is pivotal for personalized therapy and mortality reduction. In this study, genomic and transcriptomic data of HCC are obtained from The Cancer Genome Atlas (TCGA) to investigate the affected coding and non-coding RNAs, as well as their regulatory network due to certain mutational signatures of HCC. Different types of RNAs have their specific enriched biological functions in mutational signature-specific HCCs, upregulated coding RNAs are predominantly associated with lipid metabolism-related pathways, and downregulated coding RNAs are enriched in axonogenesis for tumor microenvironment generation. Additionally, differentially expressed miRNAs are inclined to concentrate in cancer-related signaling pathways. Some of these RNAs also serve as prognostic factors that help predict the survival outcome of HCCs with certain mutational signatures. Furthermore, deregulation of competing endogenous RNA (ceRNA) regulatory network is identified, which suggests a potential therapy via interference of miRNA activity for mutational signature-specific HCC. This study proposes a projection approach to reduce therapeutic complexity from genomic mutations to transcriptomic alterations. Through this method, we identify genes and pathways critical for mutational signature-specific HCC and further discover a series of prognostic markers indicating patient survival outcome.
Collapse
|
8
|
Identification of a Five-Gene Panel to Assess Prognosis for Gastric Cancer. BIOMED RESEARCH INTERNATIONAL 2022; 2022:5593619. [PMID: 35187167 PMCID: PMC8850031 DOI: 10.1155/2022/5593619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 11/25/2022]
Abstract
Methods Two datasets were used as training and validation cohorts to establish the predictive model. We used three types of screening criteria: background analysis, pathway analysis, and functional analysis provided by the cBioportal website. Fisher's exact test and multivariable logistic regression were performed to screen out related genes. Furthermore, we performed receiver operating characteristic (ROC) and Kaplan–Meier curve analyses to evaluate the correlation between the selected genes and overall survival. Result We screened five genes (KNL1, NRXN1, C6, CCDC169-SOHLH2, and TTN) that were highly related to recurrence of GC. The area under the receiver operating characteristic (ROC) curve was 0.813, which was much higher than that of the baseline model (AUC = 0.699). This result suggested that the mutation of five selected genes had a significant effect on the prediction of recurrence compared with other factors (age, stages, history, etc.). Furthermore, the Kaplan-Meier estimator also revealed that the mutation of five genes positively correlated with patient survival. Conclusions The patients who have mutations in these five genes may experience longer survival than those who do not have mutations. This five-gene panel will likely be a practical tool for prognostic evaluation and will provide another possible way for clinicians to determine therapy.
Collapse
|
9
|
Tang YY, Wei PJ, Zhao JP, Xia J, Cao RF, Zheng CH. Identification of driver genes based on gene mutational effects and network centrality. BMC Bioinformatics 2021; 22:457. [PMID: 34560840 PMCID: PMC8461858 DOI: 10.1186/s12859-021-04377-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As one of the deadliest diseases in the world, cancer is driven by a few somatic mutations that disrupt the normal growth of cells, and leads to abnormal proliferation and tumor development. The vast majority of somatic mutations did not affect the occurrence and development of cancer; thus, identifying the mutations responsible for tumor occurrence and development is one of the main targets of current cancer treatments. RESULTS To effectively identify driver genes, we adopted a semi-local centrality measure and gene mutation effect function to assess the effect of gene mutations on changes in gene expression patterns. Firstly, we calculated the mutation score for each gene. Secondly, we identified differentially expressed genes (DEGs) in the cohort by comparing the expression profiles of tumor samples and normal samples, and then constructed a local network for each mutation gene using DEGs and mutant genes according to the protein-protein interaction network. Finally, we calculated the score of each mutant gene according to the objective function. The top-ranking mutant genes were selected as driver genes. We name the proposed method as mutations effect and network centrality. CONCLUSIONS Four types of cancer data in The Cancer Genome Atlas were tested. The experimental data proved that our method was superior to the existing network-centric method, as it was able to quickly and easily identify driver genes and rare driver factors.
Collapse
Affiliation(s)
- Yun-Yun Tang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Pi-Jing Wei
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Junfeng Xia
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Rui-Fen Cao
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China.,Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China. .,College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| |
Collapse
|
10
|
Guo WF, Zhang SW, Zeng T, Akutsu T, Chen L. Network control principles for identifying personalized driver genes in cancer. Brief Bioinform 2021; 21:1641-1662. [PMID: 31711128 DOI: 10.1093/bib/bbz089] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 06/26/2019] [Accepted: 06/27/2019] [Indexed: 02/02/2023] Open
Abstract
To understand tumor heterogeneity in cancer, personalized driver genes (PDGs) need to be identified for unraveling the genotype-phenotype associations corresponding to particular patients. However, most of the existing driver-focus methods mainly pay attention on the cohort information rather than on individual information. Recent developing computational approaches based on network control principles are opening a new way to discover driver genes in cancer, particularly at an individual level. To provide comprehensive perspectives of network control methods on this timely topic, we first considered the cancer progression as a network control problem, in which the expected PDGs are altered genes by oncogene activation signals that can change the individual molecular network from one health state to the other disease state. Then, we reviewed the network reconstruction methods on single samples and introduced novel network control methods on single-sample networks to identify PDGs in cancer. Particularly, we gave a performance assessment of the network structure control-based PDGs identification methods on multiple cancer datasets from TCGA, for which the data and evaluation package also are publicly available. Finally, we discussed future directions for the application of network control methods to identify PDGs in cancer and diverse biological processes.
Collapse
Affiliation(s)
- Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 611-0011, Japan
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China.,School of Life Science and Technology, ShanghaiTech University, 201210 Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai 201210, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
11
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
12
|
Identification of Breast Cancer Subtype-Specific Biomarkers by Integrating Copy Number Alterations and Gene Expression Profiles. ACTA ACUST UNITED AC 2021; 57:medicina57030261. [PMID: 33809336 PMCID: PMC7998437 DOI: 10.3390/medicina57030261] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 03/01/2021] [Accepted: 03/09/2021] [Indexed: 12/20/2022]
Abstract
Background and Objectives: Breast cancer is a heterogeneous disease categorized into four subtypes. Previous studies have shown that copy number alterations of several genes are implicated with the development and progression of many cancers. This study evaluates the effects of DNA copy number alterations on gene expression levels in different breast cancer subtypes. Materials and Methods: We performed a computational analysis integrating copy number alterations and gene expression profiles in 1024 breast cancer samples grouped into four molecular subtypes: luminal A, luminal B, HER2, and basal. Results: Our analyses identified several genes correlated in all subtypes such as KIAA1967 and MCPH1. In addition, several subtype-specific genes that showed a significant correlation between copy number and gene expression profiles were detected: SMARCB1, AZIN1, MTDH in luminal A, PPP2R5E, APEX1, GCN5 in luminal B, TNFAIP1, PCYT2, DIABLO in HER2, and FAM175B, SENP5, SCAF1 in basal subtype. Conclusions: This study showed that computational analyses integrating copy number and gene expression can contribute to unveil the molecular mechanisms of cancer and identify new subtype-specific biomarkers.
Collapse
|
13
|
Urbanek-Trzeciak MO, Galka-Marciniak P, Nawrocka PM, Kowal E, Szwec S, Giefing M, Kozlowski P. Pan-cancer analysis of somatic mutations in miRNA genes. EBioMedicine 2020; 61:103051. [PMID: 33038763 PMCID: PMC7648123 DOI: 10.1016/j.ebiom.2020.103051] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/16/2020] [Accepted: 09/16/2020] [Indexed: 02/08/2023] Open
Abstract
Background miRNAs are considered important players in oncogenesis, serving either as oncomiRs or suppressormiRs. Although the accumulation of somatic alterations is an intrinsic aspect of cancer development and many important cancer-driving mutations have been identified in protein-coding genes, the area of functional somatic mutations in miRNA genes is heavily understudied. Methods Here, based on the analysis of large genomic datasets, mostly the whole-exome sequencing of over 10,000 cancer/normal sample pairs deposited within the TCGA repository, we undertook an analysis of somatic mutations in miRNA genes. Findings We identified and characterized over 10,000 somatic mutations and showed that some of the miRNA genes are overmutated in Pan-Cancer and/or specific cancers. Nonrandom occurrence of the identified mutations was confirmed by a strong association of overmutated miRNA genes with KEGG pathways, most of which were related to specific cancer types or cancer-related processes. Additionally, we showed that mutations in some of the overmutated genes correlate with miRNA expression, cancer staging, and patient survival. Interpretation Our study is the first comprehensive Pan-Cancer study of cancer somatic mutations in miRNA genes. It may help to understand the consequences of mutations in miRNA genes and the identification of miRNA functional mutations. The results may also be the first step (form the basis and provide the resources) in the development of computational and/or statistical approaches/tools dedicated to the identification of cancer-driver miRNA genes. Funding This work was supported by research grants from the Polish National Science Centre 2016/22/A/NZ2/00184 and 2015/17/N/NZ3/03629.
Collapse
Affiliation(s)
| | | | - Paulina M Nawrocka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Ewelina Kowal
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Sylwia Szwec
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Maciej Giefing
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
14
|
Jia P, Pei G, Zhao Z. CNet: a multi-omics approach to detecting clinically associated, combinatory genomic signatures. Bioinformatics 2020; 35:5207-5215. [PMID: 31141125 DOI: 10.1093/bioinformatics/btz441] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 02/15/2019] [Accepted: 05/23/2019] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Genome-wide multi-omics profiling of complex diseases provides valuable resources and opportunities to discover associations between various measures of genes and diseases. Currently, a pressing challenge is how to effectively detect functional genes associated with or causing phenotypic outcomes. We developed CNet to identify groups of genomic signatures whose combinatory effect is significantly associated with clinical and phenotypical outcomes. RESULTS CNet builds on a generalized sequential feedforward method, augmented by a down-sampling bootstrap strategy to reduce random hitchhiking signatures. It further applies a dynamic trimming procedure to remove relatively less informative signatures at every step. CNet can manage heterogeneous genomic signature profiles simultaneously and select the best signature to represent a specific gene. To deal with various forms of clinical and phenotypical measurements, we introduced four models to deal with continuous, categorical and censored data. We tested CNet using drug-response data, multidimensional cancer genomics data and genome-wide association study data for multiple traits. Our results demonstrated that in various scenarios, CNet could effectively identify signatures that are associated with the outcomes. In addition, we applied CNet to identify likely disease-causing chains involving somatic mutations, pathway activities and patient outcomes. With appropriate setting, CNet can be applied in many biological conditions. AVAILABILITY AND IMPLEMENTATION CNet can be downloaded at https://github.com/bsml320/CNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
15
|
Dinstag G, Shamir R. PRODIGY: personalized prioritization of driver genes. Bioinformatics 2020; 36:1831-1839. [PMID: 31681944 PMCID: PMC7703777 DOI: 10.1093/bioinformatics/btz815] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 09/03/2019] [Accepted: 10/30/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Evolution of cancer is driven by few somatic mutations that disrupt cellular processes, causing abnormal proliferation and tumor development, whereas most somatic mutations have no impact on progression. Distinguishing those mutated genes that drive tumorigenesis in a patient is a primary goal in cancer therapy: Knowledge of these genes and the pathways on which they operate can illuminate disease mechanisms and indicate potential therapies and drug targets. Current research focuses mainly on cohort-level driver gene identification but patient-specific driver gene identification remains a challenge. METHODS We developed a new algorithm for patient-specific ranking of driver genes. The algorithm, called PRODIGY, analyzes the expression and mutation profiles of the patient along with data on known pathways and protein-protein interactions. Prodigy quantifies the impact of each mutated gene on every deregulated pathway using the prize-collecting Steiner tree model. Mutated genes are ranked by their aggregated impact on all deregulated pathways. RESULTS In testing on five TCGA cancer cohorts spanning >2500 patients and comparison to validated driver genes, Prodigy outperformed extant methods and ranking based on network centrality measures. Our results pinpoint the pleiotropic effect of driver genes and show that Prodigy is capable of identifying even very rare drivers. Hence, Prodigy takes a step further toward personalized medicine and treatment. AVAILABILITY AND IMPLEMENTATION The Prodigy R package is available at: https://github.com/Shamir-Lab/PRODIGY. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gal Dinstag
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
16
|
Zia A, Rashid S. Systems Biology and Integrated Computational Methods for Cancer-Associated Mutation Analysis. 'ESSENTIALS OF CANCER GENOMIC, COMPUTATIONAL APPROACHES AND PRECISION MEDICINE 2020:335-362. [DOI: 10.1007/978-981-15-1067-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
17
|
Sharma A, Jiang C, De S. Dissecting the sources of gene expression variation in a pan-cancer analysis identifies novel regulatory mutations. Nucleic Acids Res 2019; 46:4370-4381. [PMID: 29672706 PMCID: PMC5961375 DOI: 10.1093/nar/gky271] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Accepted: 03/29/2018] [Indexed: 02/07/2023] Open
Abstract
Although the catalog of cancer-associated mutations in protein-coding regions is nearly complete for all major cancer types, an assessment of regulatory changes in cancer genomes and their clinical significance remain largely preliminary. Adopting bottom-up approach, we quantify the effects of different sources of gene expression variation in a cohort of 3899 samples from 10 cancer types. We find that copy number alterations, epigenetic changes, transcription factors and microRNAs collectively explain, on average, only 31–38% and 18–26% expression variation for cancer-associated and other genes, respectively, and that among these factors copy number alteration has the highest effect. We show that the genes with systematic, large expression variation that could not be attributed to these factors are enriched for pathways related to cancer hallmarks. Integrating whole genome sequencing data and focusing on genes with systematic expression variation we identify novel, recurrent regulatory mutations affecting known cancer genes such as NKX2-1 and GRIN2D in multiple cancer types. Nonetheless, at a genome-wide scale proportions of gene expression variation attributed to recurrent point mutations appear to be modest so far, especially when compared to that attributed to copy number changes – a pattern different from that observed for other complex diseases and traits. We suspect that, owing to plasticity and redundancy in biological pathways, regulatory alterations show complex combinatorial patterns, modulating gene expression in cancer genomes at a finer scale.
Collapse
Affiliation(s)
- Anchal Sharma
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers the State University of New Jersey. New Brunswick, NJ 08901, USA
| | - Chuan Jiang
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers the State University of New Jersey. New Brunswick, NJ 08901, USA
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers the State University of New Jersey. New Brunswick, NJ 08901, USA
| |
Collapse
|
18
|
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 2019; 46:10546-10562. [PMID: 30295871 PMCID: PMC6237755 DOI: 10.1093/nar/gky889] [Citation(s) in RCA: 259] [Impact Index Per Article: 43.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/20/2018] [Indexed: 12/18/2022] Open
Abstract
Recent high throughput experimental methods have been used to collect large biomedical omics datasets. Clustering of single omic datasets has proven invaluable for biological and medical research. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here, we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types. In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic comparison of leading multi-omics and multi-view clustering algorithms. The results highlight key issues regarding the use of single- versus multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the growing use of multi-omics data, we expect these issues to be important for future progress in the field.
Collapse
Affiliation(s)
- Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
19
|
Zhang W, Wang SL. A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration. Biochem Genet 2019; 58:16-39. [PMID: 31115714 DOI: 10.1007/s10528-019-09924-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 05/02/2019] [Indexed: 12/17/2022]
Abstract
The identification of the cancer driver genes is essential for personalized therapy. The mutation frequency of most driver genes is in the middle (2-20%) or even lower range, which makes it difficult to find the driver genes with low-frequency mutations. Other forms of genomic aberrations, such as copy number variations (CNVs) and epigenetic changes, may also reflect cancer progression. In this work, a method for identifying the potential cancer driver genes (iPDG) based on molecular data integration is proposed. DNA copy number variation, somatic mutation, and gene expression data of matched cancer samples are integrated. In combination with the method of iKEEG, the "key genes" of cancer are identified, and the change in their expression levels is used for auxiliary evaluation of whether the mutated genes are potential drivers. For a mutated gene, the concept of mutational effect is defined, which takes into account the effects of copy number variation, mutation gene itself, and its neighbor genes. The method mainly includes two steps: the first step is data preprocessing. First, DNA copy number variation and somatic mutation data are integrated. Then, the integrated data are mapped to a given interaction network, and the diffusion kernel is used to form the mutation effect matrix. The second step is to obtain the key genes by using the iKGGE method, and construct the connection matrix by means of the gene expression data of the key genes and mutation impact matrix of the mutated genes. Experiments on TCGA breast cancer and Glioblastoma multiforme datasets demonstrate that iPDG is effective not only to identify the known cancer driver genes but also to discover the rare potential driver genes. When measured by functional enrichment analysis, we find that these genes are clearly associated with these two types of cancers.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
20
|
Larmuseau M, Verbeke LPC, Marchal K. Associating expression and genomic data using co-occurrence measures. Biol Direct 2019; 14:10. [PMID: 31072345 PMCID: PMC6507230 DOI: 10.1186/s13062-019-0240-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract Recent technological evolutions have led to an exponential increase in data in all the omics fields. It is expected that integration of these different data sources, will drastically enhance our knowledge of the biological mechanisms behind genomic diseases such as cancer. However, the integration of different omics data still remains a challenge. In this work we propose an intuitive workflow for the integrative analysis of expression, mutation and copy number data taken from the METABRIC study on breast cancer. First, we present evidence that the expression profile of many important breast cancer genes consists of two modes or ‘regimes’, which contain important clinical information. Then, we show how the co-occurrence of these expression regimes can be used as an association measure between genes and validate our findings on the TCGA-BRCA study. Finally, we demonstrate how these co-occurrence measures can also be applied to link expression regimes to genomic aberrations, providing a more complete, integrative view on breast cancer. As a case study, an integrative analysis of the identified MLPH-FOXA1 association is performed, illustrating that the obtained expression associations are intimately linked to the underlying genomic changes. Reviewers This article was reviewed by Dirk Walther, Francisco Garcia and Isabel Nepomuceno. Electronic supplementary material The online version of this article (10.1186/s13062-019-0240-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maarten Larmuseau
- Department of Information Technology, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Lieven P C Verbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University - Imec, Technologiepark-Zwijnaarde 126, 9052, Ghent, Belgium.
| |
Collapse
|
21
|
Kartha VK, Sebastiani P, Kern JG, Zhang L, Varelas X, Monti S. CaDrA: A Computational Framework for Performing Candidate Driver Analyses Using Genomic Features. Front Genet 2019; 10:121. [PMID: 30838036 PMCID: PMC6390206 DOI: 10.3389/fgene.2019.00121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2018] [Accepted: 02/04/2019] [Indexed: 12/12/2022] Open
Abstract
The identification of genetic alteration combinations as drivers of a given phenotypic outcome, such as drug sensitivity, gene or protein expression, and pathway activity, is a challenging task that is essential to gaining new biological insights and to discovering therapeutic targets. Existing methods designed to predict complementary drivers of such outcomes lack analytical flexibility, including the support for joint analyses of multiple genomic alteration types, such as somatic mutations and copy number alterations, multiple scoring functions, and rigorous significance and reproducibility testing procedures. To address these limitations, we developed Candidate Driver Analysis or CaDrA, an integrative framework that implements a step-wise heuristic search approach to identify functionally relevant subsets of genomic features that, together, are maximally associated with a specific outcome of interest. We show CaDrA's overall high sensitivity and specificity for typically sized multi-omic datasets using simulated data, and demonstrate CaDrA's ability to identify known mutations linked with sensitivity of cancer cells to drug treatment using data from the Cancer Cell Line Encyclopedia (CCLE). We further apply CaDrA to identify novel regulators of oncogenic activity mediated by Hippo signaling pathway effectors YAP and TAZ in primary breast cancer tumors using data from The Cancer Genome Atlas (TCGA), which we functionally validate in vitro. Finally, we use pan-cancer TCGA protein expression data to show the high reproducibility of CaDrA's search procedure. Collectively, this work demonstrates the utility of our framework for supporting the fast querying of large, publicly available multi-omics datasets, including but not limited to TCGA and CCLE, for potential drivers of a given target profile of interest.
Collapse
Affiliation(s)
- Vinay K. Kartha
- Bioinformatics Program, Boston University, Boston, MA, United States
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States
| | - Paola Sebastiani
- Bioinformatics Program, Boston University, Boston, MA, United States
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | - Joseph G. Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, United States
| | - Liye Zhang
- School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, United States
| | - Stefano Monti
- Bioinformatics Program, Boston University, Boston, MA, United States
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| |
Collapse
|
22
|
Frost HR, Amos CI. A multi-omics approach for identifying important pathways and genes in human cancer. BMC Bioinformatics 2018; 19:479. [PMID: 30541428 PMCID: PMC6292115 DOI: 10.1186/s12859-018-2476-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 11/09/2018] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops when pathways controlling cell survival, cell fate or genome maintenance are disrupted by the somatic alteration of key driver genes. Understanding how pathway disruption is driven by somatic alterations is thus essential for an accurate characterization of cancer biology and identification of therapeutic targets. Unfortunately, current cancer pathway analysis methods fail to fully model the relationship between somatic alterations and pathway activity. Results To address these limitations, we developed a multi-omics method for identifying biologically important pathways and genes in human cancer. Our approach combines single-sample pathway analysis with multi-stage, lasso-penalized regression to find pathways whose gene expression can be explained largely in terms of gene-level somatic alterations in the tumor. Importantly, this method can analyze case-only data sets, does not require information regarding pathway topology and supports personalized pathway analysis using just somatic alteration data for a limited number of cancer-associated genes. The practical effectiveness of this technique is illustrated through an analysis of data from The Cancer Genome Atlas using gene sets from the Molecular Signatures Database. Conclusions Novel insights into the pathophysiology of human cancer can be obtained from statistical models that predict expression-based pathway activity in terms of non-silent somatic mutations and copy number variation. These models enable the identification of biologically important pathways and genes and support personalized pathway analysis in cases where gene expression data is unavailable. Electronic supplementary material The online version of this article (10.1186/s12859-018-2476-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA.
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA
| |
Collapse
|
23
|
Ha MJ, Banerjee S, Akbani R, Liang H, Mills GB, Do KA, Baladandayuthapani V. Personalized Integrated Network Modeling of the Cancer Proteome Atlas. Sci Rep 2018; 8:14924. [PMID: 30297783 PMCID: PMC6175854 DOI: 10.1038/s41598-018-32682-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 09/04/2018] [Indexed: 12/20/2022] Open
Abstract
Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-driven de novo causal structures, and upstream molecular profiling data to estimate cancer-specific integrated networks, infer patient-specific networks and elicit interpretable pathway-level signatures. PRECISE-based pathway signatures, can delineate pan-cancer commonalities and differences in proteomic network biology within and across tumors, demonstrates robust tumor stratification that is both biologically and clinically informative and superior prognostic power compared to existing approaches. Towards establishing the translational relevance of the functional proteome in research and clinical settings, we provide an online, publicly available, comprehensive database and visualization repository of our findings ( https://mjha.shinyapps.io/PRECISE/ ).
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Sayantan Banerjee
- Operations Management and Quantitative, Techniques Area at the Indian Institute of Management, Indore, India
| | - Rehan Akbani
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Gordon B Mills
- Oregon Health and Science University, Portland, OR, 97239, USA
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | | |
Collapse
|
24
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
25
|
FusionPathway: Prediction of pathways and therapeutic targets associated with gene fusions in cancer. PLoS Comput Biol 2018; 14:e1006266. [PMID: 30040819 PMCID: PMC6075785 DOI: 10.1371/journal.pcbi.1006266] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 08/03/2018] [Accepted: 06/05/2018] [Indexed: 12/03/2022] Open
Abstract
Numerous gene fusions have been uncovered across multiple cancer types. Although the ability to target several of these fusions has led to the development of some successful anti-cancer drugs, most of them are not druggable. Understanding the molecular pathways of a fusion is important in determining its function in oncogenesis and in developing therapeutic strategies for patients harboring the fusion. However, the molecular pathways have been elucidated for only a few fusions, in part because of the labor-intensive nature of the required functional assays. Therefore, we developed a domain-based network approach to infer the pathways of a fusion. Molecular interactions of a fusion are first predicted by using its protein domain composition, and its associated pathways are then inferred from these molecular interactions. We demonstrated the capabilities of this approach by primarily applying it to the well-studied BCR-ABL1 fusion. The approach was also applied to two undruggable fusions in sarcoma, EWS-FL1 and FUS-DDIT3. We successfully identified known genes and pathways associated with these fusions and satisfactorily validated these predictions using several benchmark sets. The predictions of EWS-FL1 and FUS-DDIT3 also correlate with results of high-throughput drug screening. To our best knowledge, this is the first approach for inferring pathways of fusions. We present a computational framework, FusionPathway, to infer the oncogenesis pathways of a fusion and help develop therapeutic strategies in these pathways for patients harboring the fusion. In this work, we successfully validated the capabilities of this approach through its application to the well-studied BCR-ABL1 fusion and two undruggable fusions in sarcoma, EWS-FL1 and FUS-DDIT3. Especially, the predictions of EWS-FL1 and FUS-DDIT3 correlate well with results of high-throughput drug screening in sarcoma cells. Therefore, FusionPathway can be an effective method to infer pathways and potential therapeutic targets that are associated with those undruggable fusions. Our results of BCR-ABL1 also suggest that FusionPathway may be able to help elucidate pathway-dependent mechanisms of resistances to those kinase fusion-targeting therapies and develop strategies to overcome the resistances. In addition, the developed R package of FusionPathways (https://github.com/perwu/FusionPathway/) can help people easily apply our approach to study other important fusions in cancer.
Collapse
|
26
|
Sinkala M, Mulder N, Martin DP. Integrative landscape of dysregulated signaling pathways of clinically distinct pancreatic cancer subtypes. Oncotarget 2018; 9:29123-29139. [PMID: 30018740 PMCID: PMC6044387 DOI: 10.18632/oncotarget.25632] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 06/04/2018] [Indexed: 12/29/2022] Open
Abstract
Despite modern therapeutic advances, the survival prospects of pancreatic cancer patients have remained poor. Besides being highly metastatic, pancreatic cancer is challenging to treat because it is caused by a heterogeneous array of somatic mutations that impact a variety of signaling pathways and cellular regulatory systems. Here we use publicly available transcriptomic, copy number alteration and mutation profiling datasets from pancreatic cancer patients together with data on disease outcomes to show that the three major pancreatic cancer subtypes each display distinctive aberrations in cell signaling and metabolic pathways. Importantly, patients afflicted with these different pancreatic cancer subtypes also exhibit distinctive survival profiles. Within these patients, we find that dysregulation of the phosphoinositide 3-kinase and mitogen-activated protein kinase pathways, and p53 mediated disruptions of cell cycle processes are apparently drivers of disease. Further, we identify for the first time the molecular perturbations of signalling networks that are likely the primary causes of oncogenesis in each of the three pancreatic cancer subtypes.
Collapse
Affiliation(s)
- Musalula Sinkala
- University of Cape Town, School of Health Sciences, Department of Integrative Biomedical Sciences, Computational Biology Division, Observatory, 7925, South Africa
| | - Nicola Mulder
- University of Cape Town, School of Health Sciences, Department of Integrative Biomedical Sciences, Computational Biology Division, Observatory, 7925, South Africa
| | - Darren Patrick Martin
- University of Cape Town, School of Health Sciences, Department of Integrative Biomedical Sciences, Computational Biology Division, Observatory, 7925, South Africa
| |
Collapse
|
27
|
Way GP, Sanchez-Vega F, La K, Armenia J, Chatila WK, Luna A, Sander C, Cherniack AD, Mina M, Ciriello G, Schultz N, Sanchez Y, Greene CS. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas. Cell Rep 2018; 23:172-180.e3. [PMID: 29617658 PMCID: PMC5918694 DOI: 10.1016/j.celrep.2018.03.046] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 02/23/2018] [Accepted: 03/12/2018] [Indexed: 12/25/2022] Open
Abstract
Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these "hidden responders" may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA) PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders.
Collapse
Affiliation(s)
- Gregory P Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Francisco Sanchez-Vega
- Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Konnor La
- Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Joshua Armenia
- Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Walid K Chatila
- Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Augustin Luna
- cBio Center, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Chris Sander
- cBio Center, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew D Cherniack
- The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Marco Mina
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Giovanni Ciriello
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Nikolaus Schultz
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Yolanda Sanchez
- Department of Molecular Systems Biology, Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
28
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
29
|
Identification of cancer genes that are independent of dominant proliferation and lineage programs. Proc Natl Acad Sci U S A 2017; 114:E11276-E11284. [PMID: 29229826 PMCID: PMC5748209 DOI: 10.1073/pnas.1714877115] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Large, multidimensional “landscaping” projects have provided datasets that can be mined to identify potential targets for subgroups of tumors. Here, we analyzed genomic and transcriptomic data from human breast tumors to identify genes whose expression is enriched in tumors harboring specific genetic alterations. However, this analysis revealed that two other factors, proliferation rate and tumor lineage, are more dominant factors in shaping tumor transcriptional programs than genetic alterations. This discovery shifted our attention to identifying genes that are independent of the dominant proliferation and lineage programs. A small subset of these genes represents candidate targets for combination cancer therapies because they are druggable, maintained after treatment with chemotherapy, essential for cell line survival, and elevated in drug-resistant stem-like cancer cells. Large, multidimensional cancer datasets provide a resource that can be mined to identify candidate therapeutic targets for specific subgroups of tumors. Here, we analyzed human breast cancer data to identify transcriptional programs associated with tumors bearing specific genetic driver alterations. Using an unbiased approach, we identified thousands of genes whose expression was enriched in tumors with specific genetic alterations. However, expression of the vast majority of these genes was not enriched if associations were analyzed within individual breast tumor molecular subtypes, across multiple tumor types, or after gene expression was normalized to account for differences in proliferation or tumor lineage. Together with linear modeling results, these findings suggest that most transcriptional programs associated with specific genetic alterations in oncogenes and tumor suppressors are highly context-dependent and are predominantly linked to differences in proliferation programs between distinct breast cancer subtypes. We demonstrate that such proliferation-dependent gene expression dominates tumor transcriptional programs relative to matched normal tissues. However, we also identified a relatively small group of cancer-associated genes that are both proliferation- and lineage-independent. A subset of these genes are attractive candidate targets for combination therapy because they are essential in breast cancer cell lines, druggable, enriched in stem-like breast cancer cells, and resistant to chemotherapy-induced down-regulation.
Collapse
|
30
|
Jia P, Zhao Z. Impacts of somatic mutations on gene expression: an association perspective. Brief Bioinform 2017; 18:413-425. [PMID: 27127206 DOI: 10.1093/bib/bbw037] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Indexed: 12/28/2022] Open
Abstract
Assessing the functional impacts of somatic mutations in cancer genomes is critical for both identifying driver mutations and developing molecular targeted therapies. Currently, it remains a fundamental challenge to distinguish the patterns through which mutations execute their biological effects and to infer biological mechanisms underlying these patterns. To this end, we systematically studied the association between somatic mutations in protein-coding regions and expression profiles, which represents an indirect measurement of impacts. We defined mutation features (mutation type, cluster and status) and built linear regression models to assess mutation associations with mRNA expression and protein expression. Our results presented a comprehensive landscape of the associations between mutation features and expression profile in multiple cancer types, including 62 genes showing mutation type associated expression changes, 21 genes showing mutation cluster associations and 51 genes showing mutation status associations. We revealed four characteristics of the patterns that mutations impact on expression. First, we showed that mutation type (truncation versus amino acid-altering mutations) was the most important determinant of expression levels. Second, we detected mutation clusters in well-studied oncogenes that were associated with gene expression. Third, we found both similarities and differences in association patterns existed within and across cancer types. Fourth, although many of the observed associations stay stable at both mRNA and protein expression levels, there are also novel associations uniquely observed at the protein level, which warrant future investigation. Taken together, our findings provided implications for cancer driver gene prioritization and insights into the functional consequences of somatic mutations.
Collapse
|
31
|
Xi J, Wang M, Li A. DGPathinter: a novel model for identifying driver genes via knowledge-driven matrix factorization with prior knowledge from interactome and pathways. PEERJ COMPUTER SCIENCE 2017; 3:e133. [DOI: 10.7717/peerj-cs.133] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Cataloging mutated driver genes that confer a selective growth advantage for tumor cells from sporadic passenger mutations is a critical problem in cancer genomic research. Previous studies have reported that some driver genes are not highly frequently mutated and cannot be tested as statistically significant, which complicates the identification of driver genes. To address this issue, some existing approaches incorporate prior knowledge from an interactome to detect driver genes which may be dysregulated by interaction network context. However, altered operations of many pathways in cancer progression have been frequently observed, and prior knowledge from pathways is not exploited in the driver gene identification task. In this paper, we introduce a driver gene prioritization method called driver gene identification through pathway and interactome information (DGPathinter), which is based on knowledge-based matrix factorization model with prior knowledge from both interactome and pathways incorporated. When DGPathinter is applied on somatic mutation datasets of three types of cancers and evaluated by known driver genes, the prioritizing performances of DGPathinter are better than the existing interactome driven methods. The top ranked genes detected by DGPathinter are also significantly enriched for known driver genes. Moreover, most of the top ranked scored pathways given by DGPathinter are also cancer progression-associated pathways. These results suggest that DGPathinter is a useful tool to identify potential driver genes.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| |
Collapse
|
32
|
Yi S, Lin S, Li Y, Zhao W, Mills GB, Sahni N. Functional variomics and network perturbation: connecting genotype to phenotype in cancer. Nat Rev Genet 2017; 18:395-410. [PMID: 28344341 DOI: 10.1038/nrg.2017.8] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Proteins interact with other macromolecules in complex cellular networks for signal transduction and biological function. In cancer, genetic aberrations have been traditionally thought to disrupt the entire gene function. It has been increasingly appreciated that each mutation of a gene could have a subtle but unique effect on protein function or network rewiring, contributing to diverse phenotypic consequences across cancer patient populations. In this Review, we discuss the current understanding of cancer genetic variants, including the broad spectrum of mutation classes and the wide range of mechanistic effects on gene function in the context of signalling networks. We highlight recent advances in computational and experimental strategies to study the diverse functional and phenotypic consequences of mutations at the base-pair resolution. Such information is crucial to understanding the complex pleiotropic effect of cancer genes and provides a possible link between genotype and phenotype in cancer.
Collapse
Affiliation(s)
- Song Yi
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Shengda Lin
- Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Yongsheng Li
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Wei Zhao
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Gordon B Mills
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Nidhi Sahni
- Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.,Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
33
|
Pan Y, Yan C, Hu Y, Fan Y, Pan Q, Wan Q, Torcivia-Rodriguez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Sci Rep 2017; 7:42169. [PMID: 28176830 PMCID: PMC5296879 DOI: 10.1038/srep42169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 01/05/2017] [Indexed: 01/13/2023] Open
Abstract
Single nucleotide variations (SNVs) can result in loss or gain of protein functional sites. We analyzed the effects of SNVs on enzyme active sites, ligand binding sites, and various types of post translational modification (PTM) sites. We found that, for most types of protein functional sites, the SNV pattern differs between germline and somatic mutations as well as between synonymous and non-synonymous mutations. From a total of 51,138 protein functional site affecting SNVs (pfsSNVs), a pan-cancer analysis revealed 142 somatic pfsSNVs in five or more cancer types. By leveraging patient information for somatic pfsSNVs, we identified 17 loss of functional site SNVs and 60 gain of functional site SNVs which are significantly enriched in patients with specific cancer types. Of the key pfsSNVs identified in our analysis above, we highlight 132 key pfsSNVs within 17 genes that are found in well-established cancer associated gene lists. For illustrating how key pfsSNVs can be prioritized further, we provide a use case where we performed survival analysis showing that a loss of phosphorylation site pfsSNV at position 105 in MEF2A is significantly associated with decreased pancreatic cancer patient survival rate. These 132 pfsSNVs can be used in developing genetic testing pipelines.
Collapse
Affiliation(s)
- Yang Pan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Cheng Yan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Yu Hu
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Yu Fan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Qing Pan
- The Department of Statistics, The George Washington University, Washington, DC 20037, United States of America
| | - Quan Wan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - John Torcivia-Rodriguez
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Raja Mazumder
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America.,McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, United States of America
| |
Collapse
|
34
|
Xi J, Wang M, Li A. Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. MOLECULAR BIOSYSTEMS 2017; 13:2135-2144. [DOI: 10.1039/c7mb00303j] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
An integrated approach to identify driver genes based on information of somatic mutations, the interaction network and Gene Ontology similarity.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
| | - Minghui Wang
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| | - Ao Li
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| |
Collapse
|
35
|
Dong C, Guo Y, Yang H, He Z, Liu X, Wang K. iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes. Genome Med 2016; 8:135. [PMID: 28007024 PMCID: PMC5180414 DOI: 10.1186/s13073-016-0390-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 12/05/2016] [Indexed: 12/31/2022] Open
Abstract
Cancer results from the acquisition of somatic driver mutations. Several computational tools can predict driver genes from population-scale genomic data, but tools for analyzing personal cancer genomes are underdeveloped. Here we developed iCAGES, a novel statistical framework that infers driver variants by integrating contributions from coding, non-coding, and structural variants, identifies driver genes by combining genomic information and prior biological knowledge, then generates prioritized drug treatment. Analysis on The Cancer Genome Atlas (TCGA) data showed that iCAGES predicts whether patients respond to drug treatment (P = 0.006 by Fisher's exact test) and long-term survival (P = 0.003 from Cox regression). iCAGES is available at http://icages.wglab.org .
Collapse
Affiliation(s)
- Chengliang Dong
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, 90089, USA
- Biostatistics Graduate Program, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, 90089, USA
| | - Yunfei Guo
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, 90089, USA
- Biostatistics Graduate Program, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, 90089, USA
| | - Hui Yang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, 90089, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Zeyu He
- Department of Computer Science, New York University, New York, NY, 10012, USA
| | - Xiaoming Liu
- Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Division of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, 90089, USA.
- Institute for Genomic Medicine, Columbia University, 630 W. 168th St, Room 11-451, New York, NY, 10032, USA.
| |
Collapse
|
36
|
Pulido-Tamayo S, Weytjens B, De Maeyer D, Marchal K. SSA-ME Detection of cancer driver genes using mutual exclusivity by small subnetwork analysis. Sci Rep 2016; 6:36257. [PMID: 27808240 PMCID: PMC5093737 DOI: 10.1038/srep36257] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 10/11/2016] [Indexed: 11/21/2022] Open
Abstract
Because of its clonal evolution a tumor rarely contains multiple genomic alterations in the same pathway as disrupting the pathway by one gene often is sufficient to confer the complete fitness advantage. As a result, many cancer driver genes display mutual exclusivity across tumors. However, searching for mutually exclusive gene sets requires analyzing all possible combinations of genes, leading to a problem which is typically too computationally complex to be solved without a stringent a priori filtering, restricting the mutations included in the analysis. To overcome this problem, we present SSA-ME, a network-based method to detect cancer driver genes based on independently scoring small subnetworks for mutual exclusivity using a reinforced learning approach. Because of the algorithmic efficiency, no stringent upfront filtering is required. Analysis of TCGA cancer datasets illustrates the added value of SSA-ME: well-known recurrently mutated but also rarely mutated drivers are prioritized. We show that using mutual exclusivity to detect cancer driver genes is complementary to state-of-the-art approaches. This framework, in which a large number of small subnetworks are being analyzed in order to solve a computationally complex problem (SSA), can be generically applied to any problem in which local neighborhoods in a network hold useful information.
Collapse
Affiliation(s)
- Sergio Pulido-Tamayo
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium.,Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium.,Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium.,Dept. of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium.,Grupo de Investigación en Ciencias Biológicas y Bioprocesos (Cibiop), Universidad EAFIT, Carrera 49 N° 7 Sur-50, Medellín, Colombia
| | - Bram Weytjens
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium.,Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium.,Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium.,Dept. of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Dries De Maeyer
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium.,Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium.,Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium.,Dept. of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Kathleen Marchal
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium.,Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium.,Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium.,Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa
| |
Collapse
|
37
|
Si H, Lu H, Yang X, Mattox A, Jang M, Bian Y, Sano E, Viadiu H, Yan B, Yau C, Ng S, Lee SK, Romano RA, Davis S, Walker RL, Xiao W, Sun H, Wei L, Sinha S, Benz CC, Stuart JM, Meltzer PS, Van Waes C, Chen Z. TNF-α modulates genome-wide redistribution of ΔNp63α/TAp73 and NF-κB cREL interactive binding on TP53 and AP-1 motifs to promote an oncogenic gene program in squamous cancer. Oncogene 2016; 35:5781-5794. [PMID: 27132513 PMCID: PMC5093089 DOI: 10.1038/onc.2016.112] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 01/11/2016] [Accepted: 01/19/2016] [Indexed: 12/11/2022]
Abstract
The Cancer Genome Atlas (TCGA) network study of 12 cancer types (PanCancer 12) revealed frequent mutation of TP53, and amplification and expression of related TP63 isoform ΔNp63 in squamous cancers. Further, aberrant expression of inflammatory genes and TP53/p63/p73 targets were detected in the PanCancer 12 project, reminiscent of gene programs comodulated by cREL/ΔNp63/TAp73 transcription factors we uncovered in head and neck squamous cell carcinomas (HNSCCs). However, how inflammatory gene signatures and cREL/p63/p73 targets are comodulated genome wide is unclear. Here, we examined how the inflammatory factor tumor necrosis factor-α (TNF-α) broadly modulates redistribution of cREL with ΔNp63α/TAp73 complexes and signatures genome wide in the HNSCC model UM-SCC46 using chromatin immunoprecipitation sequencing (ChIP-seq). TNF-α enhanced genome-wide co-occupancy of cREL with ΔNp63α on TP53/p63 sites, while unexpectedly promoting redistribution of TAp73 from TP53 to activator protein-1 (AP-1) sites. cREL, ΔNp63α and TAp73 binding and oligomerization on NF-κB-, TP53- or AP-1-specific sequences were independently validated by ChIP-qPCR (quantitative PCR), oligonucleotide-binding assays and analytical ultracentrifugation. Function of the binding activity was confirmed using TP53-, AP-1- and NF-κB-specific REs or p21, SERPINE1 and IL-6 promoter luciferase reporter activities. Concurrently, TNF-α regulated a broad gene network with cobinding activities for cREL, ΔNp63α and TAp73 observed upon array profiling and reverse transcription-PCR. Overlapping target gene signatures were observed in squamous cancer subsets and in inflamed skin of transgenic mice overexpressing ΔNp63α. Furthermore, multiple target genes identified in this study were linked to TP63 and TP73 activity and increased gene expression in large squamous cancer samples from PanCancer 12 TCGA by CircleMap. PARADIGM inferred pathway analysis revealed the network connection of TP63 and NF-κB complexes through an AP-1 hub, further supporting our findings. Thus, inflammatory cytokine TNF-α mediates genome-wide redistribution of the cREL/p63/p73, and AP-1 interactome, to diminish TAp73 tumor suppressor function and reciprocally activate NF-κB and AP-1 gene programs implicated in malignancy.
Collapse
Affiliation(s)
- Han Si
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Hai Lu
- Orthopaedic Center, Zhujiang Hospital Guangzhou, Guangdong,
China
| | - Xinping Yang
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Austin Mattox
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Minyoung Jang
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Yansong Bian
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Eleanor Sano
- Department of Chemistry and Biochemistry, University of
California, San Diego, La Jolla, CA
| | - Hector Viadiu
- Instituto de Química, Universidad Nacional
Autónoma de México (UNAM), Circuito Exterior, Ciudad Universitaria,
Mexico City, D.F. 04510, MÉXICO
| | - Bin Yan
- LKS Faculty of Medicine and School of Biomedical Sciences,
LKS Faculty of Medicine and Center of Genome Sciences, The University of Hong Kong,
Hong Kong, China
| | | | - Sam Ng
- Department of Biomolecular Engineering, Center for
Biomolecular Sciences and Engineering, University of California, Santa Cruz, Santa
Cruz, CA
| | - Steven K. Lee
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Rose-Anne Romano
- Department of Biochemistry, State University of New York at
Buffalo, Center for Excellence in Bioinformatics and Life Sciences, Buffalo, New
York, USA
| | - Sean Davis
- Cancer Genetics Branch, National Cancer Institute,
Bethesda, Maryland, USA
| | - Robert L. Walker
- Cancer Genetics Branch, National Cancer Institute,
Bethesda, Maryland, USA
| | - Wenming Xiao
- Division of Bioinformatics and Biostatistics, National
Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson,
Arkansas
| | - Hongwei Sun
- Biodata Mining and Discovery Section, National Institute
of Arthritis, Musculoskeletal and Skin Diseases, Bethesda, Maryland, USA
| | - Lai Wei
- Clinical Immunology Section, National Eye Institute, NIH,
Bethesda, Maryland, USA
- State Key Laboratory of Ophthalmology, Zhongshan
Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Satrajit Sinha
- Department of Biochemistry, State University of New York at
Buffalo, Center for Excellence in Bioinformatics and Life Sciences, Buffalo, New
York, USA
| | | | - Joshua M. Stuart
- Department of Biomolecular Engineering, Center for
Biomolecular Sciences and Engineering, University of California, Santa Cruz, Santa
Cruz, CA
| | - Paul S. Meltzer
- Cancer Genetics Branch, National Cancer Institute,
Bethesda, Maryland, USA
| | - Carter Van Waes
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| | - Zhong Chen
- Tumor Biology Section, Head and Neck Surgery Branch,
National Institute on Deafness and Other Communication Disorders, NIH, Bethesda,
Maryland, USA
| |
Collapse
|
38
|
Abstract
Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065; .,The Graduate Center, The City University of New York, New York, NY 10016
| | - Eli J Draizen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894; .,Program in Bioinformatics, Boston University, Boston, Massachusetts 02215
| | - Philip E Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894; .,Office of the Director, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
39
|
Razi A, Banerjee N, Dimitrova N, Varadan V. Non-linear Bayesian framework to determine the transcriptional effects of cancer-associated genomic aberrations. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:6514-8. [PMID: 26737785 DOI: 10.1109/embc.2015.7319885] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
While the tumorigenic effects of specific recurrent mutations in known cancer driver-genes is well-characterized, not much is known about the functional relevance of the vast majority of recurrent mutations observed across cancers. Prior studies have attempted to identify functional genomic aberrations by integrating multi-omics measurements in cancer samples with community-curated biological pathway networks. However, the majority of these approaches overlook the following biological considerations: i) signaling pathway networks are highly tissue-specific and their regulatory interactions differ across tissue types; ii) regulatory factors exhibit heterogeneous influence on downstream gene transcription; iii) epigenetic and genomic alterations exhibit nonlinear impact on gene transcription. In order to accommodate these biological effects, we propose a hybrid Bayesian method to learn tissue-specific pairwise influence models amongst genes and to predict a gene's expression level as a nonlinear-function of its epigenetic and regulatory influences. We employ a novel tree-based depth-penalization mechanism in order to capture the higher regulatory impact of closer neighbors in the regulatory network. Using a breast cancer multi-omics dataset (N=1190), we show that our proposed method has superior prediction power over optimization-based regression models, with the additional advantage of revealing gene deregulations potentially driven by somatic mutations.
Collapse
|
40
|
Hou JP, Emad A, Puleo GJ, Ma J, Milenkovic O. A new correlation clustering method for cancer mutation analysis. Bioinformatics 2016; 32:3717-3728. [PMID: 27540270 DOI: 10.1093/bioinformatics/btw546] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 06/14/2016] [Accepted: 08/16/2016] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. An improved understanding of the generative mechanisms behind the mutation rules and their influence on gene community behavior is of great importance for the study of cancer. RESULTS To expand our capability to analyze combinatorial patterns of cancer alterations, we developed a rigorous methodology for cancer mutation pattern discovery based on a new, constrained form of correlation clustering. Our new algorithm, named C3 (Cancer Correlation Clustering), leverages mutual exclusivity of mutations, patient coverage and driver network concentration principles. To test C3, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying biologically relevant driver genes. The proposed agnostic clustering method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies, and it may also be used for other clustering problems on biological graphs. AVAILABILITY AND IMPLEMENTATION The source code for the C3 method can be found at https://github.com/jackhou2/C3 CONTACTS: jianma@cs.cmu.edu or milenkov@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack P Hou
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Medical Scholars Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Amin Emad
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Gregory J Puleo
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jian Ma
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Olgica Milenkovic
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
41
|
Chen T, Wang Z, Zhou W, Chong Z, Meric-Bernstam F, Mills GB, Chen K. Hotspot mutations delineating diverse mutational signatures and biological utilities across cancer types. BMC Genomics 2016; 17 Suppl 2:394. [PMID: 27356755 PMCID: PMC4928158 DOI: 10.1186/s12864-016-2727-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background An important step towards personalizing cancer treatment is to integrate heterogeneous evidences to catalog mutational hotspots that are biologically and therapeutically relevant and thus represent where targeted therapy would likely be beneficial. However, existing methods do not sufficiently delineate varying functionality of individual mutations within the same genes. Results We observed a large discordancy of mutation rates across different mutation subtypes and tumor types, and nominated 702 hotspot mutations in 549 genes in the Catalog of Somatic Mutations in Cancer (COSMIC) by considering context specific mutation characteristics such as genes, cancer types, mutation rates, mutation subtypes and sequence contexts. We observed that hotspot mutations were highly prevalent in Non CpG-island C/G transition and transversion sequence contexts in 10 tumor types, and specific insertion hotspot mutations were enriched in breast cancer and deletion hotspot mutations in colorectal cancer. We found that the hotspot mutations nominated by our approach were significantly more conserved than non-hotspot mutations in the corresponding cancer genes. We also examined the biological significance and pharmacogenomics properties of these hotspot mutations using data in the Cancer Genome Atlas (TCGA) and the Cancer Cell-Line Encyclopedia (CCLE), and found that 53 hotspot mutations are independently associated with diverse functional evidences in 1) mRNA and protein expression, 2) pathway activity, or 3) drug sensitivity and 82 were highly enriched in specific tumor types. We highlighted the distinct functional indications of hotspot mutations under different contexts and nominated novel hotspot mutations such as MAP3K4 A1199 deletion, NR1H2 Q175 insertion, and GATA3 P409 insertion as potential biomarkers or drug targets. Conclusion We identified a set of hotspot mutations across 17 tumor types by considering the background mutation rate variations among genes, tumor subtypes, mutation subtypes, and sequence contexts. We illustrated the common and distinct mutational signatures of hotspot mutations among different tumor types and investigated their variable functional relevance under different contexts, which could potentially serve as a resource for explicitly selecting targets for diagnosis, drug development, and patient management. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2727-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tenghui Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,Biostatistics, Bioinformatics and Systems Biology Program, The University of Texas Graduate School of Biomedical Sciences, Houston, TX, 77030, USA
| | - Zixing Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Wanding Zhou
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Zechen Chong
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Funda Meric-Bernstam
- Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Gordon B Mills
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.,Biostatistics, Bioinformatics and Systems Biology Program, The University of Texas Graduate School of Biomedical Sciences, Houston, TX, 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. .,Biostatistics, Bioinformatics and Systems Biology Program, The University of Texas Graduate School of Biomedical Sciences, Houston, TX, 77030, USA.
| |
Collapse
|
42
|
Grechkin M, Logsdon BA, Gentles AJ, Lee SI. Identifying Network Perturbation in Cancer. PLoS Comput Biol 2016; 12:e1004888. [PMID: 27145341 PMCID: PMC4856318 DOI: 10.1371/journal.pcbi.1004888] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 03/25/2016] [Indexed: 01/08/2023] Open
Abstract
We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.
Collapse
Affiliation(s)
- Maxim Grechkin
- Department of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America
| | | | - Andrew J. Gentles
- Center for Cancer Systems Biology, Department of Radiology, Stanford University, Stanford, California, United States of America
| | - Su-In Lee
- Department of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
43
|
Yu MK, Kramer M, Dutkowski J, Srivas R, Licon K, Kreisberg J, Ng CT, Krogan N, Sharan R, Ideker T. Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems. Cell Syst 2016; 2:77-88. [PMID: 26949740 PMCID: PMC4772745 DOI: 10.1016/j.cels.2016.02.003] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology's hierarchical structure, we organize genotype data into an "ontotype," that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.
Collapse
Affiliation(s)
- Michael Ku Yu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla CA 92093, USA
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Michael Kramer
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Biomedical Sciences Program, University of California San Diego, La Jolla CA 92093, USA
| | - Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Data4Cure, La Jolla, CA 92037, USA
| | - Rohith Srivas
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA
| | - Katherine Licon
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Jason Kreisberg
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | | | - Nevan Krogan
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco 94143, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| |
Collapse
|
44
|
Pavel AB, Sonkin D, Reddy A. Integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity. BMC SYSTEMS BIOLOGY 2016; 10:16. [PMID: 26864072 PMCID: PMC4750289 DOI: 10.1186/s12918-016-0260-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 01/25/2016] [Indexed: 01/06/2023]
Abstract
Background High throughput technologies have been used to profile genes in multiple different dimensions, such as genetic variation, copy number, gene and protein expression, epigenetics, metabolomics. Computational analyses often treat these different data types as independent, leading to an explosion in the number of features making studies under-powered and more importantly do not provide a comprehensive view of the gene’s state. We sought to infer gene activity by integrating different dimensions using biological knowledge of oncogenes and tumor suppressors. Results This paper proposes an integrative model of oncogene and tumor suppressor activity in cells which is used to identify cancer drivers and compute patient-specific gene activity scores. We have developed a Fuzzy Logic Modeling (FLM) framework to incorporate biological knowledge with multi-omics data such as somatic mutation, gene expression and copy number measurements. The advantage of using a fuzzy logic approach is to abstract meaningful biological rules from low-level numerical data. Biological knowledge is often qualitative, thus combining it with quantitative numerical measurements may leverage new biological insights about a gene’s state. We show that the oncogenic and altered tumor suppressing state of a gene can be better characterized by integrating different molecular measurements with biological knowledge than by each data type alone. We validate the gene activity score using data from the Cancer Cell Line Encyclopedia and drug sensitivity data for five compounds: BYL719 (PIK3CA inhibitor), PLX4720 (BRAF inhibitor), AZD6244 (MEK inhibitor), Erlotinib (EGFR inhibitor), and Nutlin-3 (MDM2 inhibitor). The integrative score improves prediction of drug sensitivity for the known drug targets of these compounds compared to each data type alone. The gene activity scores are also used to cluster colorectal cancer cell lines. Two subtypes of CRCs were found and potential cancer drivers and therapeutic targets for each of the subtypes were identified. Conclusions We propose a fuzzy logic based approach to infer gene activity in cancer by integrating numerical data with descriptive biological knowledge. We compute general patient-specific gene-level scores useful to determine the oncogenic or tumor suppressor status of cancer gene drivers and to cluster or classify patients. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0260-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ana B Pavel
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, 02215, MA, USA. .,Section of Computational Biomedicine, Boston University School of Medicine, 72 East Concord Street, Boston, 02118, MA, USA.
| | - Dmitriy Sonkin
- Novartis Institutes for Biomedical Research, 250 Massachusetts Ave, Cambridge, 02139, MA, USA.
| | - Anupama Reddy
- Duke University Medical Center, Durham, 27708, NC, USA.
| |
Collapse
|
45
|
Fleck JL, Pavel AB, Cassandras CG. Integrating mutation and gene expression cross-sectional data to infer cancer progression. BMC SYSTEMS BIOLOGY 2016; 10:12. [PMID: 26810975 PMCID: PMC4727329 DOI: 10.1186/s12918-016-0255-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 01/11/2016] [Indexed: 01/21/2023]
Abstract
Background A major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease. Cancer is often caused by an accumulation of mutations which produce irreversible damage to the cell’s control mechanisms of survival and proliferation. Different mutations may affect these cellular anachronisms through a combination of molecular interactions which may be dynamically changing during cancer progression. It has been previously shown that cancer accumulates mutations over time. In this paper we address the problem of cancer heterogeneity by modeling cancer progression using somatic mutation and gene expression cross-sectional data. Results We propose a novel formulation of integrating somatic mutation and gene expression data to infer the temporal sequence of events from cross-sectional data. Using a mixed integer linear program we model the interaction between groups of different mutated genes and the resulting modifications at the gene expression level. Our approach identifies a partition of mutation events which gradually produce gene expression changes to a partition of genes over time. The proposed formulation is tested using both simulated data and real breast cancer data with matched somatic mutations and gene expression measurements from The Cancer Genome Atlas. First, we classify the genes as oncogenes or tumor suppressors based on the frequency of driver mutations. As expected, the most frequently mutated genes in breast cancer are PIK3CA and TP53 genes. Then, we select those genes with most frequent driver mutations and a set of genes known to play roles in cancer development. Furthermore, we apply the proposed mixed integer linear program to identify the temporal order in which genes mutate and, simultaneously, the changes they produce at the gene expression level during cancer progression. In addition, we are able to identify known causal relationships between mutations and gene expression changes in PI3K/AKT and TP53 pathways. Conclusions This paper proposes a new model to infer the temporal sequence in which mutations occur and lead to changes at the gene expression level during cancer progression. The approach is general and can be applied to any data sets with available somatic mutations and gene expression measurements. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0255-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia L Fleck
- Division of Systems Engineering, Boston University, 15 Saint Mary's Street, Brookline, MA 02446, USA.
| | - Ana B Pavel
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, MA 02215, USA. .,Section of Computational Biomedicine, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA.
| | - Christos G Cassandras
- Division of Systems Engineering, Boston University, 15 Saint Mary's Street, Brookline, MA 02446, USA.
| |
Collapse
|
46
|
Hart T, Xie L. Providing data science support for systems pharmacology and its implications to drug discovery. Expert Opin Drug Discov 2016; 11:241-56. [PMID: 26689499 DOI: 10.1517/17460441.2016.1135126] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
INTRODUCTION The conventional one-drug-one-target-one-disease drug discovery process has been less successful in tracking multi-genic, multi-faceted complex diseases. Systems pharmacology has emerged as a new discipline to tackle the current challenges in drug discovery. The goal of systems pharmacology is to transform huge, heterogeneous, and dynamic biological and clinical data into interpretable and actionable mechanistic models for decision making in drug discovery and patient treatment. Thus, big data technology and data science will play an essential role in systems pharmacology. AREAS COVERED This paper critically reviews the impact of three fundamental concepts of data science on systems pharmacology: similarity inference, overfitting avoidance, and disentangling causality from correlation. The authors then discuss recent advances and future directions in applying the three concepts of data science to drug discovery, with a focus on proteome-wide context-specific quantitative drug target deconvolution and personalized adverse drug reaction prediction. EXPERT OPINION Data science will facilitate reducing the complexity of systems pharmacology modeling, detecting hidden correlations between complex data sets, and distinguishing causation from correlation. The power of data science can only be fully realized when integrated with mechanism-based multi-scale modeling that explicitly takes into account the hierarchical organization of biological systems from nucleic acid to proteins, to molecular interaction networks, to cells, to tissues, to patients, and to populations.
Collapse
Affiliation(s)
- Thomas Hart
- a The Rockefeller University , New York , NY , USA.,b Department of Biological Sciences, Hunter College , The City University of New York , New York , NY , USA
| | - Lei Xie
- c Department of Computer Science, Hunter College , The City University of New York , New York , NY , USA.,d The Graduate Center , The City University of New York , New York , NY , USA
| |
Collapse
|
47
|
Shi K, Gao L, Wang B. Discovering potential cancer driver genes by an integrated network-based approach. MOLECULAR BIOSYSTEMS 2016; 12:2921-31. [DOI: 10.1039/c6mb00274a] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
An integrated network-based approach is proposed to nominate driver genes. It is composed of two steps including a network diffusion step and an aggregated ranking step, which fuses the correlation between the gene mutations and gene expression, the relationship between the mutated genes and the heterogeneous characteristic of the patient mutation.
Collapse
Affiliation(s)
- Kai Shi
- School of Computer Science and Technology
- Xidian University
- Xi'an
- China
- College of Science
| | - Lin Gao
- School of Computer Science and Technology
- Xidian University
- Xi'an
- China
| | - Bingbo Wang
- School of Computer Science and Technology
- Xidian University
- Xi'an
- China
| |
Collapse
|
48
|
Liu Z, Li W, Lv J, Xie R, Huang H, Li Y, He Y, Jiang J, Chen B, Guo S, Chen L. Identification of potential COPD genes based on multi-omics data at the functional level. MOLECULAR BIOSYSTEMS 2016; 12:191-204. [PMID: 26575263 DOI: 10.1039/c5mb00577a] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Chronic obstructive pulmonary disease (COPD) is a complex disease, which involves dysfunctions in multi-omics. The changes in biological processes, such as adhesion junction, signaling transduction, transcriptional regulation, and cell proliferation, will lead to the occurrence of COPD. A novel systematic approach MMMG (Methylation-MicroRNA-MRNA-GO) was proposed to identify potential COPD genes by integrating function information with a methylation profile, a microRNA expression profile and an mRNA expression profile. 8 co-functional classes and 102 potential COPD genes were identified. These genes displayed a high performance in classifying COPD patients and normal samples, revealed COPD-related pathways, and have been confirmed to be associated with COPD by Matthews correlation coefficient (MCC)-values, literature, an independent data set, and pathways. The MMMG method that analyzed multi-omics data at the functional level could effectively identify potential COPD genes. These potential COPD genes would provide in-depth insights into understanding the complexity of COPD genome landscapes, improve the early diagnostics, and guide new efforts to develop therapeutics in the future.
Collapse
Affiliation(s)
- Zhe Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Liu M, Watson LT, Zhang L. HMMvar-func: a new method for predicting the functional outcome of genetic variants. BMC Bioinformatics 2015; 16:351. [PMID: 26518340 PMCID: PMC4628267 DOI: 10.1186/s12859-015-0781-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 10/16/2015] [Indexed: 11/14/2022] Open
Abstract
Background Numerous tools have been developed to predict the fitness effects (i.e., neutral, deleterious, or beneficial) of genetic variants on corresponding proteins. However, prediction in terms of whether a variant causes the variant bearing protein to lose the original function or gain new function is also needed for better understanding of how the variant contributes to disease/cancer. To address this problem, the present work introduces and computationally defines four types of functional outcome of a variant: gain, loss, switch, and conservation of function. The deployment of multiple hidden Markov models is proposed to computationally classify mutations by the four functional impact types. Results The functional outcome is predicted for over a hundred thyroid stimulating hormone receptor (TSHR) mutations, as well as cancer related mutations in oncogenes or tumor suppressor genes. The results show that the proposed computational method is effective in fine grained prediction of the functional outcome of a mutation, and can be used to help elucidate the molecular mechanism of disease/cancer causing mutations. The program is freely available at http://bioinformatics.cs.vt.edu/zhanglab/HMMvar/download.php. Conclusion This work is the first to computationally define and predict functional impact of mutations, loss, switch, gain, or conservation of function. These fine grained predictions can be especially useful for identifying mutations that cause or are linked to cancer. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0781-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mingming Liu
- Department of Computer Science, Virginia Polytechnic Institute & State University, Blacksburg, USA.
| | - Layne T Watson
- Department of Computer Science, Virginia Polytechnic Institute & State University, Blacksburg, USA. .,Department of Mathematics, Virginia Polytechnic Institute & State University, Blacksburg, USA. .,Department of Aerospace and Ocean Engineering, Virginia Polytechnic Institute & State University, Blacksburg, USA.
| | - Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute & State University, Blacksburg, USA.
| |
Collapse
|
50
|
Creixell P, Reimand J, Haider S, Wu G, Shibata T, Vazquez M, Mustonen V, Gonzalez-Perez A, Pearson J, Sander C, Raphael BJ, Marks DS, Ouellette BFF, Valencia A, Bader GD, Boutros PC, Stuart JM, Linding R, Lopez-Bigas N, Stein LD. Pathway and network analysis of cancer genomes. Nat Methods 2015; 12:615-621. [PMID: 26125594 DOI: 10.1038/nmeth.3440] [Citation(s) in RCA: 229] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 04/27/2015] [Indexed: 12/26/2022]
Abstract
Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
Collapse
Affiliation(s)
- Pau Creixell
- Cellular Signal Integration Group (C-SIG), Technical University of Denmark, Lyngby, Denmark
| | - Jüri Reimand
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Syed Haider
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Guanming Wu
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center, Chuo-ku, Tokyo, Japan
| | - Miguel Vazquez
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain
| | - Ville Mustonen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Abel Gonzalez-Perez
- Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain
| | - John Pearson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St. Lucia, Brisbane, Queensland, Australia
| | - Chris Sander
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Benjamin J Raphael
- Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA USA
| | - B F Francis Ouellette
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Alfonso Valencia
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Paul C Boutros
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| | - Joshua M Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA.,Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California, USA
| | - Rune Linding
- Cellular Signal Integration Group (C-SIG), Technical University of Denmark, Lyngby, Denmark.,Biotech Research & Innovation Centre (BRIC), University of Copenhagen (UCPH), DK-2200 Copenhagen, Denmark
| | - Nuria Lopez-Bigas
- Research Unit on Biomedical Informatics, University Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Lincoln D Stein
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|