101
|
Liluashvili V, Kalayci S, Fluder E, Wilson M, Gabow A, Gümüs ZH. iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D. Gigascience 2018; 6:1-13. [PMID: 28814063 PMCID: PMC5554349 DOI: 10.1093/gigascience/gix054] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 07/05/2017] [Indexed: 02/02/2023] Open
Abstract
Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field.
Collapse
Affiliation(s)
- Vaja Liluashvili
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Selim Kalayci
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eugene Fluder
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Manda Wilson
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Aaron Gabow
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zeynep H Gümüs
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
102
|
Backenroth D, He Z, Kiryluk K, Boeva V, Pethukova L, Khurana E, Christiano A, Buxbaum JD, Ionita-Laza I. FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications. Am J Hum Genet 2018; 102:920-942. [PMID: 29727691 PMCID: PMC5986983 DOI: 10.1016/j.ajhg.2018.03.026] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 03/21/2018] [Indexed: 10/17/2022] Open
Abstract
We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).
Collapse
Affiliation(s)
- Daniel Backenroth
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University, New York, NY 10032, USA
| | - Valentina Boeva
- INSERM, U900, 75005 Paris, France; Institut Curie, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Lynn Pethukova
- Department of Epidemiology, Columbia University, New York, NY 10032, USA; Department of Dermatology, Columbia University, New York, NY 10032, USA
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Medical College, Cornell University, New York, NY 10021, USA
| | - Angela Christiano
- Department of Dermatology, Columbia University, New York, NY 10032, USA; Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - Joseph D Buxbaum
- Departments of Psychiatry, Neuroscience, and Genetics and Genomic Sciences, Icahn School of Medicine at Mount SInai, New York, NY 10029, USA; Friedman Brain Institute and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | |
Collapse
|
103
|
Arun G, Diermeier SD, Spector DL. Therapeutic Targeting of Long Non-Coding RNAs in Cancer. Trends Mol Med 2018; 24:257-277. [PMID: 29449148 PMCID: PMC5840027 DOI: 10.1016/j.molmed.2018.01.001] [Citation(s) in RCA: 459] [Impact Index Per Article: 65.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 01/09/2018] [Accepted: 01/14/2018] [Indexed: 02/07/2023]
Abstract
Long non-coding RNAs (lncRNAs) represent a significant population of the human transcriptome. Many lncRNAs exhibit cell- and/or tissue/tumor-specific expression, making them excellent candidates for therapeutic applications. In this review we discuss examples of lncRNAs that demonstrate the diversity of their function in various cancer types. We also discuss recent advances in nucleic acid drug development with a focus on oligonucleotide-based therapies as a novel approach to inhibit tumor progression. The increased success rates of nucleic acid therapeutics provide an outstanding opportunity to explore lncRNAs as viable therapeutic targets to combat various aspects of cancer progression.
Collapse
Affiliation(s)
- Gayatri Arun
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; These authors contributed equally
| | - Sarah D Diermeier
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; These authors contributed equally
| | - David L Spector
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
104
|
Abstract
The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies.
Collapse
|
105
|
Gao L, Uzun Y, Gao P, He B, Ma X, Wang J, Han S, Tan K. Identifying noncoding risk variants using disease-relevant gene regulatory networks. Nat Commun 2018; 9:702. [PMID: 29453388 PMCID: PMC5816022 DOI: 10.1038/s41467-018-03133-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 01/22/2018] [Indexed: 02/01/2023] Open
Abstract
Identifying noncoding risk variants remains a challenging task. Because noncoding variants exert their effects in the context of a gene regulatory network (GRN), we hypothesize that explicit use of disease-relevant GRNs can significantly improve the inference accuracy of noncoding risk variants. We describe Annotation of Regulatory Variants using Integrated Networks (ARVIN), a general computational framework for predicting causal noncoding variants. It employs a set of novel regulatory network-based features, combined with sequence-based features to infer noncoding risk variants. Using known causal variants in gene promoters and enhancers in a number of diseases, we show ARVIN outperforms state-of-the-art methods that use sequence-based features alone. Additional experimental validation using reporter assay further demonstrates the accuracy of ARVIN. Application of ARVIN to seven autoimmune diseases provides a holistic view of the gene subnetwork perturbed by the combinatorial action of the entire set of risk noncoding mutations.
Collapse
Affiliation(s)
- Long Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yasin Uzun
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Peng Gao
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Bing He
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, Shaanxi, China
| | - Jiahui Wang
- The Jackson Laboratory, Farmington, CT, 06032, USA
| | - Shizhong Han
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Kai Tan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Cell & Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
106
|
Naidoo T, Sjödin P, Schlebusch C, Jakobsson M. Patterns of variation in cis-regulatory regions: examining evidence of purifying selection. BMC Genomics 2018; 19:95. [PMID: 29373957 PMCID: PMC5787233 DOI: 10.1186/s12864-017-4422-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 12/27/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With only 2 % of the human genome consisting of protein coding genes, functionality across the rest of the genome has been the subject of much debate. This has gained further impetus in recent years due to a rapidly growing catalogue of genomic elements, based primarily on biochemical signatures (e.g. the ENCODE project). While the assessment of functionality is a complex task, the presence of selection acting on a genomic region is a strong indicator of importance. In this study, we apply population genetic methods to investigate signals overlaying several classes of regulatory elements. RESULTS We disentangle signals of purifying selection acting directly on regulatory elements from the confounding factors of demography and purifying selection linked to e.g. nearby protein coding regions. We confirm the importance of regulatory regions proximal to coding sequence, while also finding differential levels of selection at distal regions. We note differences in purifying selection among transcription factor families. Signals of constraint at some genomic classes were also strongly dependent on their physical location relative to coding sequence. In addition, levels of selection efficacy across genomic classes differed between African and non-African populations. CONCLUSIONS In order to assign a valid signal of selection to a particular class of genomic sequence, we show that it is crucial to isolate the signal by accounting for the effects of demography and linked-purifying selection. Our study highlights the intricate interplay of factors affecting signals of selection on functional elements.
Collapse
Affiliation(s)
- Thijessen Naidoo
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Per Sjödin
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Carina Schlebusch
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Mattias Jakobsson
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden. .,Science for Life Lab, Uppsala, Sweden.
| |
Collapse
|
107
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
108
|
Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival. NPJ Genom Med 2018; 3:1. [PMID: 29354286 PMCID: PMC5765157 DOI: 10.1038/s41525-017-0040-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 11/22/2017] [Accepted: 11/29/2017] [Indexed: 01/01/2023] Open
Abstract
Cancer develops by accumulation of somatic driver mutations, which impact cellular function. Mutations in non-coding regulatory regions can now be studied genome-wide and further characterized by correlation with gene expression and clinical outcome to identify driver candidates. Using a new two-stage procedure, called ncDriver, we first screened 507 ICGC whole-genomes from 10 cancer types for non-coding elements, in which mutations are both recurrent and have elevated conservation or cancer specificity. This identified 160 significant non-coding elements, including the TERT promoter, a well-known non-coding driver element, as well as elements associated with known cancer genes and regulatory genes (e.g., PAX5, TOX3, PCF11, MAPRE3). However, in some significant elements, mutations appear to stem from localized mutational processes rather than recurrent positive selection in some cases. To further characterize the driver potential of the identified elements and shortlist candidates, we identified elements where presence of mutations correlated significantly with expression levels (e.g., TERT and CDH10) and survival (e.g., CDH9 and CDH10) in an independent set of 505 TCGA whole-genome samples. In a larger pan-cancer set of 4128 TCGA exomes with expression profiling, we identified mutational correlation with expression for additional elements (e.g., near GATA3, CDC6, ZNF217, and CTCF transcription factor binding sites). Survival analysis further pointed to MIR122, a known marker of poor prognosis in liver cancer. In conclusion, the screen for significant mutation patterns coupled with correlative mutational analysis identified new individual driver candidates and suggest that some non-coding mutations recurrently affect expression and play a role in cancer development. Mutations in the “non-coding” part of the genome have been identified that could be involved in driving cancer development. Jakob Pedersen, Henrik Hornshøj and colleagues from Aarhus University Hospital in Denmark and MIT in the United States developed a two-stage procedure to identify elements that could be driving cancer development in the part of DNA that does not code for proteins. They conducted statisical analyses on catalogs of tumor genomes to identify recurrent mutations. They then evaluated how specific these mutations were to different cancer types, their predicted functional impact, and their association with gene expression and patient survival. The analyses identified mutations in the non-coding part of cancer genomes that could be driving tumor development, but further analyses on larger sample sets need to be conducted to validate the results, which could provide a basis for biomarker discovery and precision medical treatment.
Collapse
|
109
|
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018; 137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]
Abstract
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Collapse
Affiliation(s)
- Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA.
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Christian Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Department of Life Sciences, Harvard University, Cambridge, MA, USA
| | - Xihao Li
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian Wee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
| | - Tushar Dwivedi
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Mark Daly
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|
110
|
Horn H, Lawrence MS, Chouinard CR, Shrestha Y, Hu JX, Worstell E, Shea E, Ilic N, Kim E, Kamburov A, Kashani A, Hahn WC, Campbell JD, Boehm JS, Getz G, Lage K. NetSig: network-based discovery from cancer genomes. Nat Methods 2018; 15:61-66. [PMID: 29200198 PMCID: PMC5985961 DOI: 10.1038/nmeth.4514] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 10/19/2017] [Indexed: 12/21/2022]
Abstract
Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.
Collapse
Affiliation(s)
- Heiko Horn
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Michael S. Lawrence
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Pathology and MGH Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Candace R. Chouinard
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Yashaswi Shrestha
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Jessica Xin Hu
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Elizabeth Worstell
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Emily Shea
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Nina Ilic
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Eejung Kim
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Atanas Kamburov
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Pathology and MGH Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Alireza Kashani
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - William C. Hahn
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Joshua D. Campbell
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA
| | - Jesse S. Boehm
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
- Department of Pathology and MGH Cancer Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kasper Lage
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, Cancer Program, Cambridge, MA 02142, USA
| |
Collapse
|
111
|
Cowan JR, Tariq M, Shaw C, Rao M, Belmont JW, Lalani SR, Smolarek TA, Ware SM. Copy number variation as a genetic basis for heterotaxy and heterotaxy-spectrum congenital heart defects. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0406. [PMID: 27821535 DOI: 10.1098/rstb.2015.0406] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/03/2016] [Indexed: 12/22/2022] Open
Abstract
Genomic disorders and rare copy number abnormalities are identified in 15-25% of patients with syndromic conditions, but their prevalence in individuals with isolated birth defects is less clear. A spectrum of congenital heart defects (CHDs) is seen in heterotaxy, a highly heritable and genetically heterogeneous multiple congenital anomaly syndrome resulting from failure to properly establish left-right (L-R) organ asymmetry during early embryonic development. To identify novel genetic causes of heterotaxy, we analysed copy number variants (CNVs) in 225 patients with heterotaxy and heterotaxy-spectrum CHDs using array-based genotyping methods. Clinically relevant CNVs were identified in approximately 20% of patients and encompassed both known and putative heterotaxy genes. Patients were carefully phenotyped, revealing a significant association of abdominal situs inversus with pathogenic or likely pathogenic CNVs, while d-transposition of the great arteries was more frequently associated with common CNVs. Identified cytogenetic abnormalities ranged from large unbalanced translocations to smaller, kilobase-scale CNVs, including a rare, single exon deletion in ZIC3, a gene known to cause X-linked heterotaxy. Morpholino loss-of-function experiments in Xenopus support a role for one of these novel candidates, the platelet isoform of phosphofructokinase-1 (PFKP) in heterotaxy. Collectively, our results confirm a high CNV yield for array-based testing in patients with heterotaxy, and support use of CNV analysis for identification of novel biological processes relevant to human laterality.This article is part of the themed issue 'Provocative questions in left-right asymmetry'.
Collapse
Affiliation(s)
- Jason R Cowan
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.,Department of Pediatrics and Medical and Molecular Genetics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Muhammad Tariq
- Department of Pediatrics and Medical and Molecular Genetics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202, USA.,Department of Clinical Biochemistry, University of Tabuk, Tabuk 71491, Kingdom of Saudi Arabia
| | - Chad Shaw
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mitchell Rao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - John W Belmont
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Seema R Lalani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Teresa A Smolarek
- Cincinnati Children's Hospital Medical Center, Division of Human Genetics, Cincinnati, OH 45229, USA
| | - Stephanie M Ware
- Department of Pediatrics and Medical and Molecular Genetics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
112
|
Raimondi F, Betts MJ, Lu Q, Inoue A, Gutkind JS, Russell RB. Genetic variants affecting equivalent protein family positions reflect human diversity. Sci Rep 2017; 7:12771. [PMID: 28986545 PMCID: PMC5630595 DOI: 10.1038/s41598-017-12971-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022] Open
Abstract
Members of diverse protein families often perform overlapping or redundant functions meaning that different variations within them could reflect differences between individual organisms. We investigated likely functional positions within aligned protein families that contained a significant enrichment of nonsynonymous variants in genomes of healthy individuals. We identified more than a thousand enriched positions across hundreds of family alignments with roles indicative of mammalian individuality, including sensory perception and the immune system. The most significant position is the Arginine from the Olfactory receptor “DRY” motif, which has more variants in healthy individuals than all other positions in the proteome. Odorant binding data suggests that these variants lead to receptor inactivity, and they are mostly mutually exclusive with other loss-of-function (stop/frameshift) variants. Some DRY Arginine variants correlate with smell preferences in sub-populations and all 2,504 humans studied contain a unique spectrum of active and inactive receptors. The many other variant enriched positions, across hundreds of other families might also provide insights into individual differences.
Collapse
Affiliation(s)
- Francesco Raimondi
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120, Heidelberg, Germany
| | - Matthew J Betts
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120, Heidelberg, Germany
| | - Qianhao Lu
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120, Heidelberg, Germany
| | - Asuka Inoue
- Graduate School of Pharmaceutical Science, Tohoku University, Sendai, Miyagi, Japan.,Japan Science and Technology Agency (JST), Precursory Research for Embryonic Science and Technology (PRESTO), Kawaguchi, Saitama, Japan
| | | | - Robert B Russell
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany. .,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120, Heidelberg, Germany.
| |
Collapse
|
113
|
Schubert SA, Ruano D, Elsayed FA, Boot A, Crobach S, Sarasqueta AF, Wolffenbuttel B, van der Klauw MM, Oosting J, Tops CM, van Eijk R, Vasen HFA, Vossen RHAM, Nielsen M, Castellví-Bel S, Ruiz-Ponte C, Tomlinson I, Dunlop MG, Vodicka P, Wijnen JT, Hes FJ, Morreau H, de Miranda NFCC, Sijmons RH, van Wezel T. Evidence for genetic association between chromosome 1q loci and predisposition to colorectal neoplasia. Br J Cancer 2017; 117:1215-1223. [PMID: 28742792 PMCID: PMC5589990 DOI: 10.1038/bjc.2017.240] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 05/31/2017] [Accepted: 06/30/2017] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND A substantial fraction of familial colorectal cancer (CRC) and polyposis heritability remains unexplained. This study aimed to identify predisposing loci in patients with these disorders. METHODS Homozygosity mapping was performed using 222 563 SNPs in 302 index patients with various colorectal neoplasms and 3367 controls. Linkage analysis, exome and whole-genome sequencing were performed in a family affected by microsatellite stable CRCs. Candidate variants were genotyped in 10 554 cases and 21 480 controls. Gene expression was assessed at the mRNA and protein level. RESULTS Homozygosity mapping revealed a disease-associated region at 1q32.3 which was part of the linkage region 1q32.2-42.2 identified in the CRC family. This includes a region previously associated with risk of CRC. Sequencing identified the p.Asp1432Glu variant in the MIA3 gene (known as TANGO1 or TANGO) and 472 additional rare, shared variants within the linkage region. In both cases and controls the population frequency was 0.02% for this MIA3 variant. The MIA3 mutant allele showed predominant mRNA expression in normal, cancer and precancerous tissues. Furthermore, immunohistochemistry revealed increased expression of MIA3 in adenomatous tissues. CONCLUSIONS Taken together, our two independent strategies associate genetic variations in chromosome 1q loci and predisposition to familial CRC and polyps, which warrants further investigation.
Collapse
Affiliation(s)
- Stephanie A Schubert
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Dina Ruano
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Fadwa A Elsayed
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Arnoud Boot
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Stijn Crobach
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Arantza Farina Sarasqueta
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Bruce Wolffenbuttel
- Department of Endocrinology, University of Groningen, University Medical Center Groningen, Groningen 9700 RB, The Netherlands
| | - Melanie M van der Klauw
- Department of Endocrinology, University of Groningen, University Medical Center Groningen, Groningen 9700 RB, The Netherlands
| | - Jan Oosting
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Carli M Tops
- Department of Clinical Genetics, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Ronald van Eijk
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Hans FA Vasen
- Department of Gastroenterology and Hepatology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Rolf HAM Vossen
- Department of Human Genetics, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Maartje Nielsen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Sergi Castellví-Bel
- Department of Gastroenterology, Hospital Clínic, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona, Catalonia 08036, Spain
| | - Clara Ruiz-Ponte
- Fundación Pública Galega de Medicina Xenómica (FPGMX)-SERGAS, Grupo de Medicina Xenómica-USC, Instituto de Investigación Sanitaria de Santiago (IDIS), Centro de Investigación en Red de Enfermedades Raras (CIBERER), Santiago de Compostela 15706, Spain
| | - Ian Tomlinson
- Oxford Centre for Cancer Gene Research, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Malcolm G Dunlop
- Colon Cancer Genetics Group, MRC Human Genetics Unit, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Pavel Vodicka
- Institute of Experimental Medicine, Institute of Biology and Medical Genetics, Prague 142 00, Czech Republic
| | - Juul T Wijnen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Frederik J Hes
- Department of Clinical Genetics, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Hans Morreau
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Noel FCC de Miranda
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| | - Rolf H Sijmons
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen 9700 RB, The Netherlands
| | - Tom van Wezel
- Department of Pathology, Leiden University Medical Center, Leiden University, Leiden 2300 RC, The Netherlands
| |
Collapse
|
114
|
Balasubramanian S, Fu Y, Pawashe M, McGillivray P, Jin M, Liu J, Karczewski KJ, MacArthur DG, Gerstein M. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 2017; 8:382. [PMID: 28851873 PMCID: PMC5575292 DOI: 10.1038/s41467-017-00443-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/29/2017] [Indexed: 11/09/2022] Open
Abstract
Variants predicted to result in the loss of function of human genes have attracted interest because of their clinical impact and surprising prevalence in healthy individuals. Here, we present ALoFT (annotation of loss-of-function transcripts), a method to annotate and predict the disease-causing potential of loss-of-function variants. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between loss-of-function variants that are deleterious as heterozygotes and those causing disease only in the homozygous state. Investigation of variants discovered in healthy populations suggests that each individual carries at least two heterozygous premature stop alleles that could potentially lead to disease if present as homozygotes. When applied to de novo putative loss-of-function variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in >6500 cancer exomes shows that putative loss-of-function variants predicted to be deleterious by ALoFT are enriched in known driver genes.Variants causing loss of function (LoF) of human genes have clinical implications. Here, the authors present a method to predict disease-causing potential of LoF variants, ALoFT (annotation of Loss-of-Function Transcripts) and show its application to interpreting LoF variants in different contexts.
Collapse
Affiliation(s)
- Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Regeneron Genetics Center, Tarrytown, NY, 10591, USA.
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Bina Technologies, Part of Roche Sequencing, Belmont, CA, 94002, USA
| | - Mayur Pawashe
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Patrick McGillivray
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Mike Jin
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Jeremy Liu
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
115
|
Przytycki PF, Singh M. Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med 2017; 9:79. [PMID: 28841835 PMCID: PMC5574113 DOI: 10.1186/s13073-017-0465-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 08/07/2017] [Indexed: 12/30/2022] Open
Abstract
A major aim of cancer genomics is to pinpoint which somatically mutated genes are involved in tumor initiation and progression. We introduce a new framework for uncovering cancer genes, differential mutation analysis, which compares the mutational profiles of genes across cancer genomes with their natural germline variation across healthy individuals. We present DiffMut, a fast and simple approach for differential mutational analysis, and demonstrate that it is more effective in discovering cancer genes than considerably more sophisticated approaches. We conclude that germline variation across healthy human genomes provides a powerful means for characterizing somatic mutation frequency and identifying cancer driver genes. DiffMut is available at https://github.com/Singh-Lab/Differential-Mutation-Analysis.
Collapse
Affiliation(s)
- Pawel F Przytycki
- Department of Computer Science, Princeton University, Princeton, NJ, 08544, USA.,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, 08544, USA. .,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA.
| |
Collapse
|
116
|
Vermunt MW, Creyghton MP. Transcriptional Dynamics at Brain Enhancers: from Functional Specialization to Neurodegeneration. Curr Neurol Neurosci Rep 2017; 16:94. [PMID: 27628759 PMCID: PMC5023742 DOI: 10.1007/s11910-016-0689-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Over the last decade, the noncoding part of the genome has been shown to harbour thousands of cis-regulatory elements, such as enhancers, that activate well-defined gene expression programs. Driven by the development of numerous techniques, many of these elements are now identified in multiple tissues and cell types, and their characteristics as well as importance in development and disease are becoming increasingly clear. Here, we provide an overview of the insights that were gained from the analysis of noncoding gene regulatory elements in the brain and describe their potential contribution to cell type specialization, brain function and neurodegenerative disease.
Collapse
Affiliation(s)
- Marit W Vermunt
- Hubrecht Institute-KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT, Utrecht, The Netherlands
| | - Menno P Creyghton
- Hubrecht Institute-KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584CT, Utrecht, The Netherlands.
| |
Collapse
|
117
|
Dhingra P, Martinez-Fundichely A, Berger A, Huang FW, Forbes AN, Liu EM, Liu D, Sboner A, Tamayo P, Rickman DS, Rubin MA, Khurana E. Identification of novel prostate cancer drivers using RegNetDriver: a framework for integration of genetic and epigenetic alterations with tissue-specific regulatory network. Genome Biol 2017; 18:141. [PMID: 28750683 PMCID: PMC5530464 DOI: 10.1186/s13059-017-1266-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Accepted: 06/27/2017] [Indexed: 11/22/2022] Open
Abstract
We report a novel computational method, RegNetDriver, to identify tumorigenic drivers using the combined effects of coding and non-coding single nucleotide variants, structural variants, and DNA methylation changes in the DNase I hypersensitivity based regulatory network. Integration of multi-omics data from 521 prostate tumor samples indicated a stronger regulatory impact of structural variants, as they affect more transcription factor hubs in the tissue-specific network. Moreover, crosstalk between transcription factor hub expression modulated by structural variants and methylation levels likely leads to the differential expression of target genes. We report known prostate tumor regulatory drivers and nominate novel transcription factors (ERF, CREB3L1, and POU2F2), which are supported by functional validation.
Collapse
Affiliation(s)
- Priyanka Dhingra
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Alexander Martinez-Fundichely
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Adeline Berger
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Franklin W Huang
- Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA
- Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Cancer Program, The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA, 02142, USA
| | - Andre Neil Forbes
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Eric Minwei Liu
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Deli Liu
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Department of Urology, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Andrea Sboner
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA
| | - Pablo Tamayo
- Cancer Program, The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA, 02142, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA
- Moores Cancer Center, University of California San Diego, La Jolla, California, USA
| | - David S Rickman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA.
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA.
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA.
| | - Mark A Rubin
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA.
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA.
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA.
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA.
| |
Collapse
|
118
|
Dewey FE, Murray MF, Overton JD, Habegger L, Leader JB, Fetterolf SN, O'Dushlaine C, Van Hout CV, Staples J, Gonzaga-Jauregui C, Metpally R, Pendergrass SA, Giovanni MA, Kirchner HL, Balasubramanian S, Abul-Husn NS, Hartzel DN, Lavage DR, Kost KA, Packer JS, Lopez AE, Penn J, Mukherjee S, Gosalia N, Kanagaraj M, Li AH, Mitnaul LJ, Adams LJ, Person TN, Praveen K, Marcketta A, Lebo MS, Austin-Tse CA, Mason-Suares HM, Bruse S, Mellis S, Phillips R, Stahl N, Murphy A, Economides A, Skelding KA, Still CD, Elmore JR, Borecki IB, Yancopoulos GD, Davis FD, Faucett WA, Gottesman O, Ritchie MD, Shuldiner AR, Reid JG, Ledbetter DH, Baras A, Carey DJ. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 2017; 354:354/6319/aaf6814. [PMID: 28008009 DOI: 10.1126/science.aaf6814] [Citation(s) in RCA: 391] [Impact Index Per Article: 48.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 11/16/2016] [Indexed: 11/02/2022]
Abstract
The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Korey A Kost
- Geisinger Health System, Danville, PA 17822, USA
| | | | | | - John Penn
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
| | | | | | | | | | | | | | | | | | | | - Matthew S Lebo
- Laboratory for Molecular Medicine, Cambridge, MA 02139, USA
| | | | | | | | - Scott Mellis
- Regeneron Pharmaceuticals, Tarrytown, NY 10591, USA
| | | | - Neil Stahl
- Regeneron Pharmaceuticals, Tarrytown, NY 10591, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
| | | |
Collapse
|
119
|
Jusakul A, Cutcutache I, Yong CH, Lim JQ, Huang MN, Padmanabhan N, Nellore V, Kongpetch S, Ng AWT, Ng LM, Choo SP, Myint SS, Thanan R, Nagarajan S, Lim WK, Ng CCY, Boot A, Liu M, Ong CK, Rajasegaran V, Lie S, Lim AST, Lim TH, Tan J, Loh JL, McPherson JR, Khuntikeo N, Bhudhisawasdi V, Yongvanit P, Wongkham S, Totoki Y, Nakamura H, Arai Y, Yamasaki S, Chow PKH, Chung AYF, Ooi LLPJ, Lim KH, Dima S, Duda DG, Popescu I, Broet P, Hsieh SY, Yu MC, Scarpa A, Lai J, Luo DX, Carvalho AL, Vettore AL, Rhee H, Park YN, Alexandrov LB, Gordân R, Rozen SG, Shibata T, Pairojkul C, Teh BT, Tan P. Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma. Cancer Discov 2017; 7:1116-1135. [PMID: 28667006 DOI: 10.1158/2159-8290.cd-17-0368] [Citation(s) in RCA: 662] [Impact Index Per Article: 82.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 06/07/2017] [Accepted: 06/28/2017] [Indexed: 02/07/2023]
Abstract
Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analyzed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined 4 CCA clusters-fluke-positive CCAs (clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations; conversely, fluke-negative CCAs (clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3' untranslated region deletion as a mechanism of FGFR2 upregulation. Integration of noncoding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation of H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores-mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Our results exemplify how genetics, epigenetics, and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.Significance: Integrated whole-genome and epigenomic analysis of CCA on an international scale identifies new CCA driver genes, noncoding promoter mutations, and structural variants. CCA molecular landscapes differ radically by etiology, underscoring how distinct cancer subtypes in the same organ may arise through different extrinsic and intrinsic carcinogenic processes. Cancer Discov; 7(10); 1116-35. ©2017 AACR.This article is highlighted in the In This Issue feature, p. 1047.
Collapse
Affiliation(s)
- Apinya Jusakul
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.,Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore.,The Centre for Research and Development of Medical Diagnostic Laboratories and Department of Clinical Immunology and Transfusion Sciences, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand
| | - Ioana Cutcutache
- Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Chern Han Yong
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.,Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Jing Quan Lim
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore.,Lymphoma Genomic Translational Research Laboratory, National Cancer Centre Singapore, Division of Medical Oncology, Singapore
| | - Mi Ni Huang
- Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Nisha Padmanabhan
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore
| | - Vishwa Nellore
- Department of Biostatistics and Bioinformatics, Center for Genomic and Computational Biology, Duke University, Durham, North Carolina
| | - Sarinya Kongpetch
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore.,Cholangiocarcinoma Screening and Care Program and Liver Fluke and Cholangiocarcinoma Research Centre, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.,Department of Pharmacology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Alvin Wei Tian Ng
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - Ley Moy Ng
- Cancer Science Institute of Singapore, National University of Singapore, Singapore
| | - Su Pin Choo
- Division of Medical Oncology, National Cancer Centre Singapore, Singapore
| | - Swe Swe Myint
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Raynoo Thanan
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Sanjanaa Nagarajan
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Weng Khong Lim
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.,Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Cedric Chuan Young Ng
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Arnoud Boot
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.,Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Mo Liu
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore.,Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Choon Kiat Ong
- Lymphoma Genomic Translational Research Laboratory, National Cancer Centre Singapore, Division of Medical Oncology, Singapore
| | - Vikneswari Rajasegaran
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Stefanus Lie
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore.,Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
| | - Alvin Soon Tiong Lim
- Cytogenetics Laboratory, Department of Molecular Pathology, Singapore General Hospital, Singapore
| | - Tse Hui Lim
- Cytogenetics Laboratory, Department of Molecular Pathology, Singapore General Hospital, Singapore
| | - Jing Tan
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - Jia Liang Loh
- Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore
| | - John R McPherson
- Centre for Computational Biology, Duke-NUS Medical School, Singapore
| | - Narong Khuntikeo
- Cholangiocarcinoma Screening and Care Program and Liver Fluke and Cholangiocarcinoma Research Centre, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.,Department of Surgery, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | | | - Puangrat Yongvanit
- Cholangiocarcinoma Screening and Care Program and Liver Fluke and Cholangiocarcinoma Research Centre, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Sopit Wongkham
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Yasushi Totoki
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Hiromi Nakamura
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoshi Yamasaki
- Laboratory of Molecular Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan
| | - Pierce Kah-Hoe Chow
- Division of Surgical Oncology, National Cancer Center Singapore and Office of Clinical Sciences, Duke-NUS Medical School, Singapore
| | - Alexander Yaw Fui Chung
- Department of Hepatopancreatobiliary/Transplant Surgery, Singapore General Hospital, Singapore
| | | | - Kiat Hon Lim
- Department of Anatomical Pathology, Singapore General Hospital, Singapore
| | - Simona Dima
- Center of Digestive Diseases and Liver Transplantation, Fundeni Clinical Institute, Bucharest, Romania
| | - Dan G Duda
- Edwin L. Steele Laboratories for Tumor Biology, Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Irinel Popescu
- Center of Digestive Diseases and Liver Transplantation, Fundeni Clinical Institute, Bucharest, Romania
| | - Philippe Broet
- DHU Hepatinov, Hôpital Paul Brousse, AP-HP, Villejuif, France
| | - Sen-Yung Hsieh
- Department of Gastroenterology and Hepatology, Chang Gung Memorial Hospital and Chang Gung University, Taoyuan, Taiwan
| | - Ming-Chin Yu
- Department of General Surgery, Chang Gung Memorial Hospital and Chang Gung University, Taoyuan, Taiwan
| | - Aldo Scarpa
- Applied Research on Cancer Centre (ARC-Net), University and Hospital Trust of Verona, Verona, Italy
| | - Jiaming Lai
- Department of Hepatobiliary Surgery, the First Affiliated Hospital of Sun Yat-sen University, Guangzhou, P. R. China
| | - Di-Xian Luo
- National and Local Joint Engineering Laboratory of High-through Molecular Diagnostic Technology, the First People's Hospital of Chenzhou, Southern Medical University, Chenzhou, P. R. China
| | | | - André Luiz Vettore
- Laboratory of Cancer Molecular Biology, Department of Biological Sciences, Federal University of São Paulo, Rua Pedro de Toledo, São Paulo, Brazil
| | - Hyungjin Rhee
- Department of Pathology, Brain Korea 21 PLUS Project for Medical Science, Integrated Genomic Research Center for Metabolic Regulation, Yonsei University College of Medicine, Seoul, Korea
| | - Young Nyun Park
- Department of Pathology, Brain Korea 21 PLUS Project for Medical Science, Integrated Genomic Research Center for Metabolic Regulation, Yonsei University College of Medicine, Seoul, Korea
| | - Ludmil B Alexandrov
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico
| | - Raluca Gordân
- Department of Biostatistics and Bioinformatics, Center for Genomic and Computational Biology, Duke University, Durham, North Carolina. .,Department of Computer Science, Duke University, Durham, North Carolina
| | - Steven G Rozen
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore. .,Centre for Computational Biology, Duke-NUS Medical School, Singapore.,SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre, Singapore
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan. .,Laboratory of Molecular Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan
| | - Chawalit Pairojkul
- Department of Pathology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.
| | - Bin Tean Teh
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore. .,Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore.,Cancer Science Institute of Singapore, National University of Singapore, Singapore.,SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre, Singapore.,Institute of Molecular and Cell Biology, Singapore
| | - Patrick Tan
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore. .,Cancer Science Institute of Singapore, National University of Singapore, Singapore.,SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre, Singapore.,Genome Institute of Singapore, Singapore
| |
Collapse
|
120
|
|
121
|
Zhang Y, Li S, Abyzov A, Gerstein MB. Landscape and variation of novel retroduplications in 26 human populations. PLoS Comput Biol 2017; 13:e1005567. [PMID: 28662076 PMCID: PMC5510864 DOI: 10.1371/journal.pcbi.1005567] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 07/14/2017] [Accepted: 05/12/2017] [Indexed: 01/10/2023] Open
Abstract
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
Collapse
Affiliation(s)
- Yan Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, United States of America
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
122
|
Feigin ME, Garvin T, Bailey P, Waddell N, Chang DK, Kelley DR, Shuai S, Gallinger S, McPherson JD, Grimmond SM, Khurana E, Stein LD, Biankin AV, Schatz MC, Tuveson DA. Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma. Nat Genet 2017; 49:825-833. [PMID: 28481342 PMCID: PMC5659388 DOI: 10.1038/ng.3861] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 04/10/2017] [Indexed: 12/15/2022]
Abstract
The contributions of coding mutations to tumorigenesis are relatively well known; however, little is known about somatic alterations in noncoding DNA. Here we describe GECCO (Genomic Enrichment Computational Clustering Operation) to analyze somatic noncoding alterations in 308 pancreatic ductal adenocarcinomas (PDAs) and identify commonly mutated regulatory regions. We find recurrent noncoding mutations to be enriched in PDA pathways, including axon guidance and cell adhesion, and newly identified processes, including transcription and homeobox genes. We identified mutations in protein binding sites correlating with differential expression of proximal genes and experimentally validated effects of mutations on expression. We developed an expression modulation score that quantifies the strength of gene regulation imposed by each class of regulatory elements, and found the strongest elements were most frequently mutated, suggesting a selective advantage. Our detailed single-cancer analysis of noncoding alterations identifies regulatory mutations as candidates for diagnostic and prognostic markers, and suggests new mechanisms for tumor evolution.
Collapse
Affiliation(s)
- Michael E Feigin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
- Lustgarten Foundation Pancreatic Cancer Research Laboratory, Cold Spring Harbor, New York, USA
| | - Tyler Garvin
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Peter Bailey
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK
| | - Nicola Waddell
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - David K Chang
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK
- The Kinghorn Cancer Centre, Cancer Research Program, Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia
- Department of Surgery, Bankstown Hospital, Bankstown, Sydney, New South Wales, Australia
- South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, New South Wales, Australia
| | - David R Kelley
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Shimin Shuai
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
- Division of General Surgery, Toronto General Hospital, Toronto, Ontario, Canada
| | - John D McPherson
- Genome Technologies Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Sean M Grimmond
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Ekta Khurana
- Sandra and Edward Meyer Cancer Center, Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York, USA
| | - Lincoln D Stein
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Andrew V Biankin
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland, UK
- South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Liverpool, New South Wales, Australia
- West of Scotland Pancreatic Unit, Glasgow Royal Infirmary, Glasgow, Scotland, UK
| | - Michael C Schatz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA
| | - David A Tuveson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
- Lustgarten Foundation Pancreatic Cancer Research Laboratory, Cold Spring Harbor, New York, USA
- Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| |
Collapse
|
123
|
Gallone G, Haerty W, Disanto G, Ramagopalan SV, Ponting CP, Berlanga-Taylor AJ. Identification of genetic variants affecting vitamin D receptor binding and associations with autoimmune disease. Hum Mol Genet 2017; 26:2164-2176. [PMID: 28335003 PMCID: PMC5886188 DOI: 10.1093/hmg/ddx092] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Revised: 02/28/2017] [Accepted: 03/07/2017] [Indexed: 01/24/2023] Open
Abstract
Large numbers of statistically significant associations between sentinel SNPs and case-control status have been replicated by genome-wide association studies. Nevertheless, few underlying molecular mechanisms of complex disease are currently known. We investigated whether variation in binding of a transcription factor, the vitamin D receptor (VDR), whose activating ligand vitamin D has been proposed as a modifiable factor in multiple disorders, could explain any of these associations. VDR modifies gene expression by binding DNA as a heterodimer with the Retinoid X receptor (RXR). We identified 43,332 genetic variants significantly associated with altered VDR binding affinity (VDR-BVs) using a high-resolution (ChIP-exo) genome-wide analysis of 27 HapMap lymphoblastoid cell lines. VDR-BVs are enriched in consensus RXR::VDR binding motifs, yet most fell outside of these motifs, implying that genetic variation often affects the binding affinity only indirectly. Finally, we compared 341 VDR-BVs replicating by position in multiple individuals against background sets of variants lying within VDR-binding regions that had been matched in allele frequency and were independent with respect to linkage disequilibrium. In this stringent test, these replicated VDR-BVs were significantly (q < 0.1) and substantially (>2-fold) enriched in genomic intervals associated with autoimmune and other diseases, including inflammatory bowel disease, Crohn's disease and rheumatoid arthritis. The approach's validity is underscored by RXR::VDR motif sequence being predictive of binding strength and being evolutionarily constrained. Our findings are consistent with altered RXR::VDR binding contributing to immunity-related diseases. Replicated VDR-BVs associated with these disorders could represent causal disease risk alleles whose effect may be modifiable by vitamin D levels.
Collapse
Affiliation(s)
- Giuseppe Gallone
- MRC Functional Genomics Unit
- Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3PT, UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Wilfried Haerty
- MRC Functional Genomics Unit
- Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3PT, UK
| | - Giulio Disanto
- Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3PT, UK
| | | | - Chris P. Ponting
- MRC Functional Genomics Unit
- Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3PT, UK
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Antonio J. Berlanga-Taylor
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 7BN, UK
- CGAT, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, UK
- MRC-PHE Centre for Environment and Health, Department of Epidemiology & Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, St Mary’s Campus, Norfolk Place, London W2 1PG, UK
| |
Collapse
|
124
|
Steward CA, Parker APJ, Minassian BA, Sisodiya SM, Frankish A, Harrow J. Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med 2017; 9:49. [PMID: 28558813 PMCID: PMC5448149 DOI: 10.1186/s13073-017-0441-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome sequencing. However, in a considerable number of patients, the genetic basis remains unclear. As clinicians begin to consider whole-genome sequencing, an understanding of the processes and tools involved and the factors to consider in the annotation of the structure and function of genomic elements that might influence variant identification is crucial. Here, we discuss and illustrate the strengths and weaknesses of approaches for the annotation and classification of important elements of protein-coding genes, other genomic elements such as pseudogenes and the non-coding genome, comparative-genomic approaches for inferring gene function, and new technologies for aiding genome annotation, as a practical guide for clinicians when considering pathogenic sequence variation. Complete and accurate annotation of structure and function of genome features has the potential to reduce both false-negative (from missing annotation) and false-positive (from incorrect annotation) errors in causal variant identification in exome and genome sequences. Re-analysis of unsolved cases will be necessary as newer technology improves genome annotation, potentially improving the rate of diagnosis.
Collapse
Affiliation(s)
- Charles A Steward
- Congenica Ltd, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR, UK. .,The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | - Berge A Minassian
- Department of Pediatrics (Neurology), University of Texas Southwestern, Dallas, TX, USA.,Program in Genetics and Genome Biology and Department of Paediatrics (Neurology), The Hospital for Sick Children and University of Toronto, Toronto, Canada
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Institute of Neurology, London, WC1N 3BG, UK.,Chalfont Centre for Epilepsy, Chesham Lane, Chalfont St Peter, Buckinghamshire, SL9 0RJ, UK
| | - Adam Frankish
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jennifer Harrow
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Illumina Inc, Great Chesterford, Essex, CB10 1XL, UK
| |
Collapse
|
125
|
Zahir FR, Mwenifumbo JC, Chun HJE, Lim EL, Van Karnebeek CDM, Couse M, Mungall KL, Lee L, Makela N, Armstrong L, Boerkoel CF, Langlois SL, McGillivray BM, Jones SJM, Friedman JM, Marra MA. Comprehensive whole genome sequence analyses yields novel genetic and structural insights for Intellectual Disability. BMC Genomics 2017; 18:403. [PMID: 28539120 PMCID: PMC5442678 DOI: 10.1186/s12864-017-3671-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 03/29/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Intellectual Disability (ID) is among the most common global disorders, yet etiology is unknown in ~30% of patients despite clinical assessment. Whole genome sequencing (WGS) is able to interrogate the entire genome, providing potential to diagnose idiopathic patients. METHODS We conducted WGS on eight children with idiopathic ID and brain structural defects, and their normal parents; carrying out an extensive data analyses, using standard and discovery approaches. RESULTS We verified de novo pathogenic single nucleotide variants (SNV) in ARID1B c.1595delG and PHF6 c.820C > T, potentially causative de novo two base indels in SQSTM1 c.115_116delinsTA and UPF1 c.1576_1577delinsA, and de novo SNVs in CACNB3 c.1289G > A, and SPRY4 c.508 T > A, of uncertain significance. We report results from a large secondary control study of 2081 exomes probing the pathogenicity of the above genes. We analyzed structural variation by four different algorithms including de novo genome assembly. We confirmed a likely contributory 165 kb de novo heterozygous 1q43 microdeletion missed by clinical microarray. The de novo assembly resulted in unmasking hidden genome instability that was missed by standard re-alignment based algorithms. We also interrogated regulatory sequence variation for known and hypothesized ID genes and present useful strategies for WGS data analyses for non-coding variation. CONCLUSION This study provides an extensive analysis of WGS in the context of ID, providing genetic and structural insights into ID and yielding diagnoses.
Collapse
Affiliation(s)
- Farah R Zahir
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada. .,Qatar Biomedical Research Institute, Hamad Bin Khalifa University, P.O. Box 34110, Doha, Qatar.
| | - Jill C Mwenifumbo
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada
| | - Hye-Jung E Chun
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada
| | - Emilia L Lim
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada
| | - Clara D M Van Karnebeek
- Department of Pediatrics, Centre for Molecular Medicine & Therapeutics Child & Family Research Institute, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Madeline Couse
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Karen L Mungall
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada
| | - Leora Lee
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Nancy Makela
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Linlea Armstrong
- Provincial Medical Genetics Programme, Children's & Women's Health Centre of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Cornelius F Boerkoel
- Provincial Medical Genetics Programme, Children's & Women's Health Centre of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Sylvie L Langlois
- Provincial Medical Genetics Programme, Children's & Women's Health Centre of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Barbara M McGillivray
- Provincial Medical Genetics Programme, Children's & Women's Health Centre of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Center, Vancouver, BC, V5Z 4S6, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
126
|
Dhingra P, Fu Y, Gerstein M, Khurana E. Using FunSeq2 for Coding and Non‐Coding Variant Annotation and Prioritization. ACTA ACUST UNITED AC 2017; 57:15.11.1-15.11.17. [DOI: 10.1002/cpbi.23] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Priyanka Dhingra
- Institute for Computational Biomedicine, Weill Cornell Medical College New York New York
- Department of Physiology and Biophysics, Weill Cornell Medical College New York New York 10021
| | - Yao Fu
- Bina Technologies, Roche Sequencing Redwood City California
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University New Haven Connecticut
- Department of Molecular Biophysics and Biochemistry, Yale University New Haven Connecticut
- Department of Computer Science, Yale University New Haven Connecticut
| | - Ekta Khurana
- Institute for Computational Biomedicine, Weill Cornell Medical College New York New York
- Department of Physiology and Biophysics, Weill Cornell Medical College New York New York 10021
- Meyer Cancer Center, Weill Cornell Medical College New York New York
- Englander Institute for Precision Medicine, Weill Cornell Medical College New York New York
| |
Collapse
|
127
|
Cisneros L, Bussey KJ, Orr AJ, Miočević M, Lineweaver CH, Davies P. Ancient genes establish stress-induced mutation as a hallmark of cancer. PLoS One 2017; 12:e0176258. [PMID: 28441401 PMCID: PMC5404761 DOI: 10.1371/journal.pone.0176258] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 04/08/2017] [Indexed: 02/07/2023] Open
Abstract
Cancer is sometimes depicted as a reversion to single cell behavior in cells adapted to live in a multicellular assembly. If this is the case, one would expect that mutation in cancer disrupts functional mechanisms that suppress cell-level traits detrimental to multicellularity. Such mechanisms should have evolved with or after the emergence of multicellularity. This leads to two related, but distinct hypotheses: 1) Somatic mutations in cancer will occur in genes that are younger than the emergence of multicellularity (1000 million years [MY]); and 2) genes that are frequently mutated in cancer and whose mutations are functionally important for the emergence of the cancer phenotype evolved within the past 1000 million years, and thus would exhibit an age distribution that is skewed to younger genes. In order to investigate these hypotheses we estimated the evolutionary ages of all human genes and then studied the probability of mutation and their biological function in relation to their age and genomic location for both normal germline and cancer contexts. We observed that under a model of uniform random mutation across the genome, controlled for gene size, genes less than 500 MY were more frequently mutated in both cases. Paradoxically, causal genes, defined in the COSMIC Cancer Gene Census, were depleted in this age group. When we used functional enrichment analysis to explain this unexpected result we discovered that COSMIC genes with recessive disease phenotypes were enriched for DNA repair and cell cycle control. The non-mutated genes in these pathways are orthologous to those underlying stress-induced mutation in bacteria, which results in the clustering of single nucleotide variations. COSMIC genes were less common in regions where the probability of observing mutational clusters is high, although they are approximately 2-fold more likely to harbor mutational clusters compared to other human genes. Our results suggest this ancient mutational response to stress that evolved among prokaryotes was co-opted to maintain diversity in the germline and immune system, while the original phenotype is restored in cancer. Reversion to a stress-induced mutational response is a hallmark of cancer that allows for effectively searching “protected” genome space where genes causally implicated in cancer are located and underlies the high adaptive potential and concomitant therapeutic resistance that is characteristic of cancer.
Collapse
Affiliation(s)
- Luis Cisneros
- NantOmics, Tempe, Arizona, United States of America
- BEYOND Center for Fundamental Concepts in Science, Arizona State University, Tempe, Arizona, United States of America
| | - Kimberly J. Bussey
- NantOmics, Tempe, Arizona, United States of America
- Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, United States of America
- * E-mail:
| | - Adam J. Orr
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Milica Miočević
- Department of Psychology, Arizona State University, Tempe, Arizona, United States of America
| | - Charles H. Lineweaver
- Planetary Science Institute, Research School of Astronomy and Astrophysics and Research School of Earth Sciences, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Paul Davies
- BEYOND Center for Fundamental Concepts in Science, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
128
|
Juul M, Bertl J, Guo Q, Nielsen MM, Świtnicki M, Hornshøj H, Madsen T, Hobolth A, Pedersen JS. Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate. eLife 2017; 6. [PMID: 28362259 PMCID: PMC5440169 DOI: 10.7554/elife.21778] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Accepted: 03/14/2017] [Indexed: 02/06/2023] Open
Abstract
Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5'UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance.
Collapse
Affiliation(s)
- Malene Juul
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Johanna Bertl
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Qianyun Guo
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| | - Morten Muhlig Nielsen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Michał Świtnicki
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Henrik Hornshøj
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Tobias Madsen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| | - Jakob Skou Pedersen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark.,Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| |
Collapse
|
129
|
Lowdon RF, Wang T. Epigenomic annotation of noncoding mutations identifies mutated pathways in primary liver cancer. PLoS One 2017; 12:e0174032. [PMID: 28333948 PMCID: PMC5363827 DOI: 10.1371/journal.pone.0174032] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 03/02/2017] [Indexed: 11/19/2022] Open
Abstract
Evidence that noncoding mutation can result in cancer driver events is mounting. However, it is more difficult to assign molecular biological consequences to noncoding mutations than to coding mutations, and a typical cancer genome contains many more noncoding mutations than protein-coding mutations. Accordingly, parsing functional noncoding mutation signal from noise remains an important challenge. Here we use an empirical approach to identify putatively functional noncoding somatic single nucleotide variants (SNVs) from liver cancer genomes. Annotation of candidate variants by publicly available epigenome datasets finds that 40.5% of SNVs fall in regulatory elements. When assigned to specific regulatory elements, we find that the distribution of regulatory element mutation mirrors that of nonsynonymous coding mutation, where few regulatory elements are recurrently mutated in a patient population but many are singly mutated. We find potential gain-of-binding site events among candidate SNVs, suggesting a mechanism of action for these variants. When aggregating noncoding somatic mutation in promoters, we find that genes in the ERBB signaling and MAPK signaling pathways are significantly enriched for promoter mutations. Altogether, our results suggest that functional somatic SNVs in cancer are sporadic, but occasionally occur in regulatory elements and may affect phenotype by creating binding sites for transcriptional regulators. Accordingly, we propose that noncoding mutation should be formally accounted for when determining gene- and pathway-mutation burden in cancer.
Collapse
Affiliation(s)
- Rebecca F. Lowdon
- Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University in St. Louis, Saint Louis, Missouri, United States of America
| | - Ting Wang
- Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University in St. Louis, Saint Louis, Missouri, United States of America
| |
Collapse
|
130
|
Jang K, Kim K, Cho A, Lee I, Choi JK. Network perturbation by recurrent regulatory variants in cancer. PLoS Comput Biol 2017; 13:e1005449. [PMID: 28333928 PMCID: PMC5383347 DOI: 10.1371/journal.pcbi.1005449] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 04/06/2017] [Accepted: 03/10/2017] [Indexed: 12/12/2022] Open
Abstract
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. Identifying driver variants is a current challenge facing cancer genomics. A well-established and robust method for this is to find recurrence in large cohorts of samples. Recurrence patterns of amino acid-changing variants can reveal oncogenes and tumor suppressor genes. However, such single-gene approaches have limitations because of rare variants. Therefore, recurrently affected protein complexes, network modules, or signaling pathways have been identified based on network-level recurrence. Here we dissect chromatin interactome to identify cis-regulatory variants that show high gene-level recurrence. We then employ the gene regulatory network and protein interactome to characterize putative cancer genes with cis-regulatory variant recurrence. These genes were located at critical positions in the regulatory network. By contrast, they are at the circumference in the protein interactome; instead, they form a network module with coding cancer genes located at hub positions. Furthermore, the coding and regulatory variants associated via these interactions showed exclusive and complementary occurrence patterns across tumor samples. Therefore, we suggest that transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes.
Collapse
Affiliation(s)
- Kiwon Jang
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Kwoneel Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Ara Cho
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
- * E-mail:
| |
Collapse
|
131
|
Abstract
The vast majority of somatic variants in cancer genomes occur in non-coding regions. However, progress in cancer genomics in the past decade has been mostly focused on coding regions, largely due to the prohibitive cost of whole genome sequencing (WGS). Recent technological advances have decreased sequencing costs leading to the current acquisition of thousands of tumor whole genome sequences which has led to a hunt for non-coding drivers. The most well characterized regulatory drivers are in the TERT promoter and have been identified in many cancer types. Despite the larger fraction of somatic variants occurring in non-coding regions, the number of non-coding drivers identified so far is much less than the number of coding region drivers. Here we discuss reasons that may hinder the detection of non-coding drivers. We also examine the relationship between non-coding genetic variation and epigenetic state in tumor cells and assert the need for additional epigenetic data sets as a prerequisite for understanding the rewiring of regulatory networks in cancer.
Collapse
|
132
|
Li S, Shuch BM, Gerstein MB. Whole-genome analysis of papillary kidney cancer finds significant noncoding alterations. PLoS Genet 2017; 13:e1006685. [PMID: 28358873 PMCID: PMC5391127 DOI: 10.1371/journal.pgen.1006685] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 04/13/2017] [Accepted: 03/13/2017] [Indexed: 01/30/2023] Open
Abstract
To date, studies on papillary renal-cell carcinoma (pRCC) have largely focused on coding alterations in traditional drivers, particularly the tyrosine-kinase, Met. However, for a significant fraction of tumors, researchers have been unable to determine a clear molecular etiology. To address this, we perform the first whole-genome analysis of pRCC. Elaborating on previous results on MET, we find a germline SNP (rs11762213) in this gene predicting prognosis. Surprisingly, we detect no enrichment for small structural variants disrupting MET. Next, we scrutinize noncoding mutations, discovering potentially impactful ones associated with MET. Many of these are in an intron connected to a known, oncogenic alternative-splicing event; moreover, we find methylation dysregulation nearby, leading to a cryptic promoter activation. We also notice an elevation of mutations in the long noncoding RNA NEAT1, and these mutations are associated with increased expression and unfavorable outcome. Finally, to address the origin of pRCC heterogeneity, we carry out whole-genome analyses of mutational processes. First, we investigate genome-wide mutational patterns, finding they are governed mostly by methylation-associated C-to-T transitions. We also observe significantly more mutations in open chromatin and early-replicating regions in tumors with chromatin-modifier alterations. Finally, we reconstruct cancer-evolutionary trees, which have markedly different topologies and suggested evolutionary trajectories for the different subtypes of pRCC.
Collapse
Affiliation(s)
- Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Brian M. Shuch
- Department of Urology, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
133
|
Long C, Jian J, Li X, Wang G, Wang J. A comprehensive analysis of cancer-driving mutations and genes in kidney cancer. Oncol Lett 2017; 13:2151-2160. [PMID: 28454375 PMCID: PMC5403472 DOI: 10.3892/ol.2017.5689] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 12/09/2016] [Indexed: 02/06/2023] Open
Abstract
An accumulation of driver mutations is important for cancer formation and progression, and leads to the disruption of genes and signaling pathways. The identification of driver mutations and genes has been the subject of numerous previous studies. The present study was performed to identify cancer-driving mutations and genes in renal cell carcinoma (RCC), prioritizing noncoding variants with a high functional impact, in order to analyze the most informative features. Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping version 2 (Polyphen2) and MutationAssessor were applied to predict deleterious mutations in the coding genome. OncodriveFM and OncodriveCLUST were used to detect potential driver genes and signaling pathways. The functional impact of noncoding variants was evaluated using Combined Annotation Dependent Depletion, FunSeq2 and Genome-Wide Annotation of Variants. Noncoding features were analyzed with respect to their enrichment of high-scoring variants. A total of 1,327 coding mutations in clear cell RCC, 258 in chromophobe RCC and 1,186 in papillary RCC were predicted to be deleterious by all three of MutationAssessor, Polyphen2 and SIFT. In total, 77 genes were positively selected by OncodriveFM and 1 by OncodriveCLUST, 45 of which were recurrently mutated genes. In addition, 10 signaling pathways were recurrently mutated and had a high functional impact bias (FM bias), and 31 novel signaling pathways with high FM bias were identified. Furthermore, noncoding regulatory features and conserved regions contained numerous high-scoring variants, and expression, replication time, GC content and recombination rate were positively correlated with the densities of high-scoring variants. In conclusion, the present study identified a list of cancer-driving genes and signaling pathways, features like regulatory elements, conserved regions, replication time, expression, GC content and recombination rate are major factors that affect the distribution of high-scoring non-coding mutations in kidney cancer.
Collapse
Affiliation(s)
- Chengmei Long
- Department of Organ Transplantation, Jiangxi Provincial People's Hospital, School of Medicine, Nanchang University, Nanchang, Jiangxi 330006, P.R. China
| | - Jinbo Jian
- Department of Oncology, Binzhou Medical University Hospital, Binzhou, Shandong 256603, P.R. China
| | - Xinchang Li
- Department of Organ Transplantation, Jiangxi Provincial People's Hospital, School of Medicine, Nanchang University, Nanchang, Jiangxi 330006, P.R. China
| | - Gongxian Wang
- Department of Urology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, P.R. China
| | - Jingen Wang
- Department of Urology, Jiangxi Provincial People's Hospital, School of Medicine, Nanchang University, Nanchang, Jiangxi 330006, P.R. China
| |
Collapse
|
134
|
Morrison AC, Huang Z, Yu B, Metcalf G, Liu X, Ballantyne C, Coresh J, Yu F, Muzny D, Feofanova E, Rustagi N, Gibbs R, Boerwinkle E. Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. Am J Hum Genet 2017; 100:205-215. [PMID: 28089252 DOI: 10.1016/j.ajhg.2016.12.009] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/14/2016] [Indexed: 01/11/2023] Open
Abstract
Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Alanna C Morrison
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA.
| | - Zhuoyi Huang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bing Yu
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xiaoming Liu
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Christie Ballantyne
- Section of Cardiovascular Research, Baylor College of Medicine, Houston, TX 77030, USA; Houston Methodist Debakey Heart and Vascular Center, Houston, TX 77030, USA
| | - Josef Coresh
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21287, USA
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Elena Feofanova
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Navin Rustagi
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
135
|
Doane AS, Elemento O. Regulatory elements in molecular networks. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2017; 9. [PMID: 28093886 DOI: 10.1002/wsbm.1374] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 11/06/2016] [Accepted: 11/17/2016] [Indexed: 12/20/2022]
Abstract
Regulatory elements determine the connectivity of molecular networks and mediate a variety of regulatory processes ranging from DNA looping to transcriptional, posttranscriptional, and posttranslational regulation. This review highlights our current understanding of the different types of regulatory elements found in molecular networks with a focus on DNA regulatory elements. We highlight technical advances and current challenges for the mapping of regulatory elements at the genome-wide scale, and describe new computational methods to uncover these elements via reconstructing regulatory networks from large genomic datasets. WIREs Syst Biol Med 2017, 9:e1374. doi: 10.1002/wsbm.1374 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Ashley S Doane
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Elemento
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
136
|
Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, Workman CT, Rigina O, Rapacki K, Stærfeldt HH, Brunak S, Jensen TS, Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods 2017; 14:61-64. [PMID: 27892958 PMCID: PMC5839635 DOI: 10.1038/nmeth.4083] [Citation(s) in RCA: 414] [Impact Index Per Article: 51.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 10/20/2016] [Indexed: 02/07/2023]
Abstract
Genome-scale human protein-protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein-protein interaction network (InWeb_InBioMap, or InWeb_IM) with severalfold more interactions (>500,000) and better functional biological relevance than comparable resources. We illustrate that InWeb_InBioMap enables functional interpretation of >4,700 cancer genomes and genes involved in autism.
Collapse
Affiliation(s)
- Taibo Li
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Rasmus Wernersson
- Intomics A/S, Lyngby, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | | | - Heiko Horn
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Johnathan Mercer
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Greg Slodkowicz
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Christopher T Workman
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Olga Rigina
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Kristoffer Rapacki
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Hans H Stærfeldt
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Kasper Lage
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Institute for Biological Psychiatry, Mental Health Center Sct. Hans, University of Copenhagen, Roskilde, Denmark
| |
Collapse
|
137
|
Wu M, Chen T, Jiang R. Global inference of disease-causing single nucleotide variants from exome sequencing data. BMC Bioinformatics 2016; 17:468. [PMID: 28155632 PMCID: PMC5260102 DOI: 10.1186/s12859-016-1325-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Whole exome sequencing (WES) has recently emerged as an effective approach for identifying genetic variants underlying human diseases. However, considerable time and labour is needed for careful investigation of candidate variants. Although filtration based on population frequencies and functional prediction scores could effectively remove common and neutral variants, hundreds or even thousands of rare deleterious variants still remain. In addition, current WES platforms also provide variant information in flanking noncoding regions, such as promoters, introns and splice sites. Despite of being recognized to harbour causal variants, these regions are usually ignored by current analysis pipelines. Results We present a novel computational method, called Glints, to overcome the above limitations. Glints is capable of identifying disease-causing SNVs in both coding and flanking noncoding regions from exome sequencing data. The principle behind Glints is that disease-causing variants should manifest their effect at both variant and gene levels. Specifically, Glints integrates 14 types of functional scores, including predictions for both coding and noncoding variants, and 9 types of association scores, which help identifying disease relevant genes. We conducted a large-scale simulation studies based on 1000 Genomes Project data and demonstrated the effectiveness of our method in both coding and flanking noncoding regions. We also applied Glints in two real exome sequencing and demonstrated its effectiveness for uncovering disease-causing SNVs. Both standalone software and web server are available at our website http://bioinfo.au.tsinghua.edu.cn/jianglab/glints. Conclusions Glints is effective for uncovering disease-causing SNVs in coding and flanking noncoding regions, which is supported by both simulation and real case studies. Glints is expected to be a useful tool for human genetics research based on exome sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1325-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mengmeng Wu
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Beijing, China.,Department of Computer Science, Tsinghua University, Beijing, China
| | - Ting Chen
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Beijing, China.,Department of Computer Science, Tsinghua University, Beijing, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Beijing, China. .,Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
138
|
Chen J, Wang B, Regan L, Gerstein M. Intensification: A Resource for Amplifying Population-Genetic Signals with Protein Repeats. J Mol Biol 2016; 429:435-445. [PMID: 27939289 DOI: 10.1016/j.jmb.2016.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 11/16/2016] [Accepted: 12/03/2016] [Indexed: 11/16/2022]
Abstract
Large-scale genome sequencing holds great promise for the interpretation of protein structures through the discovery of many, rare functional variants in the human population. However, because protein-coding regions are under high selective constraints, these variants occur at low frequencies, such that there is often insufficient statistics for downstream calculations. To address this problem, we develop the Intensification approach, which uses the modular structure of repeat protein domains to amplify signals of selection from population genetics and traditional interspecies conservation. In particular, we are able to aggregate variants at the codon level to identify important positions in repeat domains that show strong conservation signals. This allows us to compare conservation over different evolutionary timescales. It also enables us to visualize population-genetic measures on protein structures. We make available the Intensification results as an online resource (http://intensification.gersteinlab.org) and illustrate the approach through a case study on the tetratricopeptide repeat.
Collapse
Affiliation(s)
- Jieming Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Bo Wang
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Lynne Regan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
139
|
Hahn MM, de Voer RM, Hoogerbrugge N, Ligtenberg MJL, Kuiper RP, van Kessel AG. The genetic heterogeneity of colorectal cancer predisposition - guidelines for gene discovery. Cell Oncol (Dordr) 2016; 39:491-510. [PMID: 27279102 PMCID: PMC5121185 DOI: 10.1007/s13402-016-0284-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is a cumulative term applied to a clinically and genetically heterogeneous group of neoplasms that occur in the bowel. Based on twin studies, up to 45 % of the CRC cases may involve a heritable component. Yet, only in 5-10 % of these cases high-penetrant germline mutations are found (e.g. mutations in APC and DNA mismatch repair genes) that result in a familial aggregation and/or an early onset of the disease. Genome-wide association studies have revealed that another ~5 % of the CRC cases may be explained by a cumulative effect of low-penetrant risk factors. Recent attempts to identify novel genetic factors using whole exome and whole genome sequencing has proven to be difficult since the remaining, yet to be discovered, high penetrant CRC predisposing genes appear to be rare. In addition, most of the moderately penetrant candidate genes identified so far have not been confirmed in independent cohorts. Based on literature examples, we here discuss how careful patient and cohort selection, candidate gene and variant selection, and corroborative evidence may be employed to facilitate the discovery of novel CRC predisposing genes. CONCLUSIONS The picture emerges that the genetic predisposition to CRC is heterogeneous, involving complex interplays between common and rare (inter)genic variants with different penetrances. It is anticipated, however, that the use of large clinically well-defined patient and control datasets, together with improved functional and technical possibilities, will yield enough power to unravel this complex interplay and to generate accurate individualized estimates for the risk to develop CRC.
Collapse
Affiliation(s)
- M M Hahn
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - R M de Voer
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - N Hoogerbrugge
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - M J L Ligtenberg
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - R P Kuiper
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
| | - A Geurts van Kessel
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| |
Collapse
|
140
|
Simakov O, Kawashima T. Independent evolution of genomic characters during major metazoan transitions. Dev Biol 2016; 427:179-192. [PMID: 27890449 DOI: 10.1016/j.ydbio.2016.11.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 11/08/2016] [Accepted: 11/14/2016] [Indexed: 02/03/2023]
Abstract
Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies.
Collapse
Affiliation(s)
- Oleg Simakov
- Okinawa Institute of Science and Technology, Okinawa, Japan.
| | | |
Collapse
|
141
|
Zhou S, Treloar AE, Lupien M. Emergence of the Noncoding Cancer Genome: A Target of Genetic and Epigenetic Alterations. Cancer Discov 2016; 6:1215-1229. [PMID: 27807102 DOI: 10.1158/2159-8290.cd-16-0745] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 08/17/2016] [Indexed: 12/14/2022]
Abstract
The emergence of whole-genome annotation approaches is paving the way for the comprehensive annotation of the human genome across diverse cell and tissue types exposed to various environmental conditions. This has already unmasked the positions of thousands of functional cis-regulatory elements integral to transcriptional regulation, such as enhancers, promoters, and anchors of chromatin interactions that populate the noncoding genome. Recent studies have shown that cis-regulatory elements are commonly the targets of genetic and epigenetic alterations associated with aberrant gene expression in cancer. Here, we review these findings to showcase the contribution of the noncoding genome and its alteration in the development and progression of cancer. We also highlight the opportunities to translate the biological characterization of genetic and epigenetic alterations in the noncoding cancer genome into novel approaches to treat or monitor disease. SIGNIFICANCE The majority of genetic and epigenetic alterations accumulate in the noncoding genome throughout oncogenesis. Discriminating driver from passenger events is a challenge that holds great promise to improve our understanding of the etiology of different cancer types. Advancing our understanding of the noncoding cancer genome may thus identify new therapeutic opportunities and accelerate our capacity to find improved biomarkers to monitor various stages of cancer development. Cancer Discov; 6(11); 1215-29. ©2016 AACR.
Collapse
Affiliation(s)
- Stanley Zhou
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Aislinn E Treloar
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Mathieu Lupien
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
142
|
Xue C, Raveendran M, Harris RA, Fawcett GL, Liu X, White S, Dahdouli M, Rio Deiros D, Below JE, Salerno W, Cox L, Fan G, Ferguson B, Horvath J, Johnson Z, Kanthaswamy S, Kubisch HM, Liu D, Platt M, Smith DG, Sun B, Vallender EJ, Wang F, Wiseman RW, Chen R, Muzny DM, Gibbs RA, Yu F, Rogers J. The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res 2016; 26:1651-1662. [PMID: 27934697 PMCID: PMC5131817 DOI: 10.1101/gr.204255.116] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 10/12/2016] [Indexed: 12/30/2022]
Abstract
Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.
Collapse
Affiliation(s)
- Cheng Xue
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - R Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gloria L Fawcett
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xiaoming Liu
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mahmoud Dahdouli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - David Rio Deiros
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jennifer E Below
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - William Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Laura Cox
- Southwest National Primate Research Center, San Antonio, Texas 78227, USA
| | - Guoping Fan
- Department of Human Genetics, University of California, Los Angeles, California 90095, USA
| | - Betsy Ferguson
- Oregon National Primate Research Center, Beaverton, Oregon 97006, USA
| | - Julie Horvath
- North Carolina Museum of Natural Sciences, Raleigh, North Carolina 27601, USA.,Biological and Biomedical Sciences, North Carolina Central University, Durham, North Carolina 27707, USA.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina 27708, USA
| | - Zach Johnson
- Yerkes National Primate Research Center, Atlanta, Georgia 30322, USA
| | - Sree Kanthaswamy
- California National Primate Research Center, Davis, California 95616, USA.,School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona 85004, USA
| | - H Michael Kubisch
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA
| | - Dahai Liu
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Michael Platt
- Department of Neurobiology, Duke University, Durham, North Carolina 27708, USA.,Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - David G Smith
- California National Primate Research Center, Davis, California 95616, USA
| | - Binghua Sun
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Eric J Vallender
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA.,New England National Primate Research Center, Southborough, Massachusetts 01772, USA
| | - Feng Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Roger W Wiseman
- Wisconsin National Primate Research Center, Madison, Wisconsin 53711, USA
| | - Rui Chen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
143
|
Chen J, Tian W. Explaining the disease phenotype of intergenic SNP through predicted long range regulation. Nucleic Acids Res 2016; 44:8641-8654. [PMID: 27280978 PMCID: PMC5062962 DOI: 10.1093/nar/gkw519] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Revised: 05/05/2016] [Accepted: 05/29/2016] [Indexed: 12/27/2022] Open
Abstract
Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes.
Collapse
Affiliation(s)
- Jingqi Chen
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China
| |
Collapse
|
144
|
Chromatin structure-based prediction of recurrent noncoding mutations in cancer. Nat Genet 2016; 48:1321-1326. [PMID: 27723759 DOI: 10.1038/ng.3682] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 08/29/2016] [Indexed: 12/15/2022]
Abstract
Recurrence is a hallmark of cancer-driving mutations. Recurrent mutations can arise at the same site or affect the same gene at different sites. Here we identified a set of mutations arising in individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to noncoding mutations with site-specific recurrence only. We developed a prediction method that identifies potentially recurrent mutations on the basis of the features shared by mutations whose recurrence is observed in a given cohort. Our method was capable of accurately predicting recurrent mutations at the level of target genes but not mutations recurring at the same site. We experimentally validated predicted mutations in distal regulatory regions of the TERT gene. In conclusion, we propose a novel approach to discovering potential cancer-driving mutations in noncoding regions.
Collapse
|
145
|
Dong X, Wang X, Zhang F, Tian W. Genome-Wide Identification of Regulatory Sequences Undergoing Accelerated Evolution in the Human Genome. Mol Biol Evol 2016; 33:2565-75. [PMID: 27401230 PMCID: PMC5026254 DOI: 10.1093/molbev/msw128] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes.
Collapse
Affiliation(s)
- Xinran Dong
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, P.R. China
| | - Xiao Wang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, P.R. China
| | - Feng Zhang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, P.R. China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, P.R. China Children's Hospital of Fudan University, Shanghai, P.R. China
| |
Collapse
|
146
|
Li H, He Z, Gu Y, Fang L, Lv X. Prioritization of non-coding disease-causing variants and long non-coding RNAs in liver cancer. Oncol Lett 2016; 12:3987-3994. [PMID: 27895760 DOI: 10.3892/ol.2016.5135] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/16/2016] [Indexed: 01/10/2023] Open
Abstract
There are multiple bioinformatics tools available for the detection of coding driver mutations in cancers. However, the prioritization of pathogenic non-coding variants remains a challenging and demanding task. The present study was performed to discriminate non-coding disease-causing mutations and prioritize potential cancer-implicated long non-coding RNAs (lncRNAs) in liver cancer using a logistic regression model. A logistic regression model was constructed by combining 19,153 disease-associated ClinVar and human gene mutation database pathogenic variants as the response variable and non-coding features as the predictor variable. Genome-wide association study (GWAS) disease or trait-associated variants and recurrent somatic mutations were used to validate the model. Non-coding gene features with the highest fractions of load were characterized and potential cancer-associated lncRNA candidates were prioritized by combining the fraction of high-scoring regions and average score predicted by the logistic regression model. H3K9me3 and conserved regions were the most negatively and positively informative for the model, respectively. The area under the receiver operating characteristic curve of the model was 0.92. The average score of GWAS disease-associated variants was significantly increased compared with neutral single nucleotide polymorphisms (5.8642 vs. 5.4707; P<0.001), the average score of recurrent somatic mutations of liver cancer was significantly increased compared with non-recurrent somatic mutations (5.4101 vs. 5.2768; P=0.0125). The present study found regions in lncRNAs and introns/untranslated regions of protein coding genes where mutations are most likely to be damaging. In total, 847 lncRNAs were filtered out from the background. Characterization of this subset of lncRNAs showed that these lncRNAs are more conservative, less mutated and more highly expressed compared with other control lncRNAs. In addition, 23 of these lncRNAs were differentially expressed between 12 pairs of liver cancer and adjacent normal specimens. The logistic regression model is a useful tool to prioritize non-coding pathogenic variants and lncRNAs, and paves the way for the detection of non-coding driver lncRNAs in liver cancer.
Collapse
Affiliation(s)
- Hua Li
- Department of Anesthesiology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, P.R. China
| | - Zekun He
- Department of Clinical Medicine, Fuzhou Medical College of Nanchang University, Fuzhou, Jiangxi 344000, P.R. China
| | - Yang Gu
- Department of Anesthesiology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, P.R. China
| | - Lin Fang
- Department of Thyroid and Breast Surgery, Shanghai Tenth People's Hospital, Tongji University, School of Medicine, Shanghai 200072, P.R. China
| | - Xin Lv
- Department of Anesthesiology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, P.R. China
| |
Collapse
|
147
|
Abstract
Mutations in enhancer-associated chromatin-modifying components and genomic alterations in non-coding regions of the genome occur frequently in cancer, and other diseases pointing to the importance of enhancer fidelity to ensure proper tissue homeostasis. In this review, I will use specific examples to discuss how mutations in chromatin-modifying factors might affect enhancer activity of disease-relevant genes. I will then consider direct evidence from single nucleotide polymorphisms, small insertions, or deletions but also larger genomic rearrangements such as duplications, deletions, translocations, and inversions of specific enhancers to demonstrate how they have the ability to impact enhancer activity of disease genes including oncogenes and tumor suppressor genes. Considering that the scientific community only fairly recently has begun to focus its attention on "enhancer malfunction" in disease, I propose that multiple new enhancer-regulated and disease-relevant processes will be uncovered in the near future that will constitute the mechanistic basis for novel therapeutic avenues.
Collapse
Affiliation(s)
- Hans-Martin Herz
- Department of Cell & Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
148
|
Poulos RC, Sloane MA, Hesson LB, Wong JWH. The search for cis-regulatory driver mutations in cancer genomes. Oncotarget 2016; 6:32509-25. [PMID: 26356674 PMCID: PMC4741709 DOI: 10.18632/oncotarget.5085] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 08/06/2015] [Indexed: 12/16/2022] Open
Abstract
With the advent of high-throughput and relatively inexpensive whole-genome sequencing technology, the focus of cancer research has begun to shift toward analyses of somatic mutations in non-coding cis-regulatory elements of the cancer genome. Cis-regulatory elements play an important role in gene regulation, with mutations in these elements potentially resulting in changes to the expression of linked genes. The recent discoveries of recurrent TERT promoter mutations in melanoma, and recurrent mutations that create a super-enhancer regulating TAL1 expression in T-cell acute lymphoblastic leukaemia (T-ALL), have sparked significant interest in the search for other somatic cis-regulatory mutations driving cancer development. In this review, we look more closely at the TERT promoter and TAL1 enhancer alterations and use these examples to ask whether other cis-regulatory mutations may play a role in cancer susceptibility. In doing so, we make observations from the data emerging from recent research in this field, and describe the experimental and analytical approaches which could be adopted in the hope of better uncovering the true functional significance of somatic cis-regulatory mutations in cancer.
Collapse
Affiliation(s)
- Rebecca C Poulos
- Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney, Australia
| | - Mathew A Sloane
- Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney, Australia
| | - Luke B Hesson
- Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney, Australia
| | - Jason W H Wong
- Prince of Wales Clinical School and Lowy Cancer Research Centre, UNSW Australia, Sydney, Australia
| |
Collapse
|
149
|
Shi W, Fornes O, Mathelier A, Wasserman WW. Evaluating the impact of single nucleotide variants on transcription factor binding. Nucleic Acids Res 2016; 44:10106-10116. [PMID: 27492288 PMCID: PMC5137422 DOI: 10.1093/nar/gkw691] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 07/25/2016] [Accepted: 07/26/2016] [Indexed: 12/21/2022] Open
Abstract
Diseases and phenotypes caused by disrupted transcription factor (TF) binding are being identified, but progress is hampered by our limited capacity to predict such functional alterations. Improving predictions may be dependent on expanding the set of bona fide TF binding alterations. Allele-specific binding (ASB) events, where TFs preferentially bind to one of the two alleles at heterozygous sites, reveal the impact of sequence variations in altered TF binding. Here, we present the largest ASB compilation to our knowledge, 10 765 ASB events retrieved from 45 ENCODE ChIP-Seq data sets. Our analysis showed that ASB events were frequently associated with motif alterations of the ChIP'ed TF and potential partner TFs, allelic difference of DNase I hypersensitivity and allelic difference of histone modifications. For TF dimers bound symmetrically to DNA, ASB data revealed that central positions of the TF binding motifs were disproportionately important for binding. Lastly, the impact of variation on TF binding was predicted by a classification model incorporating all the investigated features of ASB events. Classification models using only DNase I hypersensitivity and sequence data exhibited predictive accuracy approaching the models with substantially more features. Taken together, the combination of ASB data and the classification model represents an important step toward elucidating regulatory variants across the human genome.
Collapse
Affiliation(s)
- Wenqiang Shi
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada.,Bioinformatics Graduate Program, University of British Columbia, 2329 W Mall, Vancouver, BC V6T 1Z4, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada
| | - Anthony Mathelier
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada.,Centre for Molecular Medicine Norway (NCMM), Nordic EMBL partnership, University of Oslo and Oslo University Hospital, Norway
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, Child & Family Research Institute, University of British Columbia, 950 28th Ave W, Vancouver, BC V5Z 4H4, Canada
| |
Collapse
|
150
|
Yuen RKC, Merico D, Cao H, Pellecchia G, Alipanahi B, Thiruvahindrapuram B, Tong X, Sun Y, Cao D, Zhang T, Wu X, Jin X, Zhou Z, Liu X, Nalpathamkalam T, Walker S, Howe JL, Wang Z, MacDonald JR, Chan A, D'Abate L, Deneault E, Siu MT, Tammimies K, Uddin M, Zarrei M, Wang M, Li Y, Wang J, Wang J, Yang H, Bookman M, Bingham J, Gross SS, Loy D, Pletcher M, Marshall CR, Anagnostou E, Zwaigenbaum L, Weksberg R, Fernandez BA, Roberts W, Szatmari P, Glazer D, Frey BJ, Ring RH, Xu X, Scherer SW. Genome-wide characteristics of de novo mutations in autism. NPJ Genom Med 2016; 1:160271-1602710. [PMID: 27525107 PMCID: PMC4980121 DOI: 10.1038/npjgenmed.2016.27] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
De novo mutations (DNMs) are important in Autism Spectrum Disorder (ASD), but so far analyses have mainly been on the ~1.5% of the genome encoding genes. Here, we performed whole genome sequencing (WGS) of 200 ASD parent-child trios and characterized germline and somatic DNMs. We confirmed that the majority of germline DNMs (75.6%) originated from the father, and these increased significantly with paternal age only (p=4.2×10-10). However, when clustered DNMs (those within 20kb) were found in ASD, not only did they mostly originate from the mother (p=7.7×10-13), but they could also be found adjacent to de novo copy number variations (CNVs) where the mutation rate was significantly elevated (p=2.4×10-24). By comparing DNMs detected in controls, we found a significant enrichment of predicted damaging DNMs in ASD cases (p=8.0×10-9; OR=1.84), of which 15.6% (p=4.3×10-3) and 22.5% (p=7.0×10-5) were in the non-coding or genic non-coding, respectively. The non-coding elements most enriched for DNM were untranslated regions of genes, boundaries involved in exon-skipping and DNase I hypersensitive regions. Using microarrays and a novel outlier detection test, we also found aberrant methylation profiles in 2/185 (1.1%) of ASD cases. These same individuals carried independently identified DNMs in the ASD risk- and epigenetic- genes DNMT3A and ADNP. Our data begins to characterize different genome-wide DNMs, and highlight the contribution of non-coding variants, to the etiology of ASD.
Collapse
Affiliation(s)
- Ryan K C Yuen
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Daniele Merico
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | - Giovanna Pellecchia
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Babak Alipanahi
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Xin Tong
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Yuhui Sun
- BGI-Shenzhen, Yantian, Shenzhen, China
| | | | - Tao Zhang
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Xueli Wu
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Xin Jin
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Ze Zhou
- BGI-Shenzhen, Yantian, Shenzhen, China
| | | | - Thomas Nalpathamkalam
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Susan Walker
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Jennifer L Howe
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Zhuozhi Wang
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Jeffrey R MacDonald
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ada Chan
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Lia D'Abate
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Eric Deneault
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Michelle T Siu
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Kristiina Tammimies
- Center of Neurodevelopmental Disorders (KIND), Pediatric Neuropsychiatry Unit, Karolinska Institutet, Stockholm, Sweden
| | - Mohammed Uddin
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Mehdi Zarrei
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | | | - Jun Wang
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Jian Wang
- BGI-Shenzhen, Yantian, Shenzhen, China
| | | | | | | | | | - Dion Loy
- Google, Mountain View, California, USA
| | | | - Christian R Marshall
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada; Department of Molecular Genetics, Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Evdokia Anagnostou
- Bloorview Research Institute, University of Toronto, Toronto, Ontario, Canada
| | - Lonnie Zwaigenbaum
- Department of Pediatrics, University of Alberta, Edmonton, Alberta, Canada
| | - Rosanna Weksberg
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada; Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Bridget A Fernandez
- Disciplines of Genetics and Medicine, Memorial University of Newfoundland, St. John's, Newfoundland, Canada; Provincial Medical Genetic Program, Eastern Health, St. John's, Newfoundland, Canada
| | - Wendy Roberts
- Autism Research Unit, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Peter Szatmari
- Autism Research Unit, The Hospital for Sick Children, Toronto, Ontario, Canada; Child Youth and Family Services, Centre for Addiction and Mental Health, Toronto, Ontario, Canada; Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - David Glazer
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Brendan J Frey
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | | | - Xun Xu
- BGI-Shenzhen, Yantian, Shenzhen, China
| | - Stephen W Scherer
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada; Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada; McLaughlin Centre, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|