201
|
Lyon KF, Strong CL, Schooler SG, Young RJ, Roy N, Ozar B, Bachmeier M, Rajasekaran S, Schiller MR. Natural variability of minimotifs in 1092 people indicates that minimotifs are targets of evolution. Nucleic Acids Res 2015; 43:6399-412. [PMID: 26068475 PMCID: PMC4513861 DOI: 10.1093/nar/gkv580] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 04/17/2015] [Accepted: 05/21/2015] [Indexed: 01/05/2023] Open
Abstract
Since the function of a short contiguous peptide minimotif can be introduced or eliminated by a single point mutation, these functional elements may be a source of human variation and a target of selection. We analyzed the variability of ∼300 000 minimotifs in 1092 human genomes from the 1000 Genomes Project. Most minimotifs have been purified by selection, with a 94% invariance, which supports important functional roles for minimotifs. Minimotifs are generally under negative selection, possessing high genomic evolutionary rate profiling (GERP) and sitewise likelihood-ratio (SLR) scores. Some are subject to neutral drift or positive selection, similar to coding regions. Most SNPs in minimotif were common variants, but with minor allele frequencies generally <10%. This was supported by low substation rates and few newly derived minimotifs. Several minimotif alleles showed different intercontinental and regional geographic distributions, strongly suggesting a role for minimotifs in adaptive evolution. We also note that 4% of PTM minimotif sites in histone tails were common variants, which has the potential to differentially affect DNA packaging among individuals. In conclusion, minimotifs are a source of functional genetic variation in the human population; thus, they are likely to be an important target of selection and evolution.
Collapse
Affiliation(s)
- Kenneth F Lyon
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Christy L Strong
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Steve G Schooler
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Richard J Young
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Nervik Roy
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Brittany Ozar
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Mark Bachmeier
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| |
Collapse
|
202
|
|
203
|
Chung JH, Cai J, Suskin BG, Zhang Z, Coleman K, Morrow BE. Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations. Hum Mutat 2015; 36:797-807. [PMID: 25981510 DOI: 10.1002/humu.22814] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/01/2015] [Indexed: 12/20/2022]
Abstract
The 22q11.2 deletion syndrome (22q11DS) affects 1:4,000 live births and presents with highly variable phenotype expressivity. In this study, we developed an analytical approach utilizing whole-genome sequencing (WGS) and integrative analysis to discover genetic modifiers. Our pipeline combined available tools in order to prioritize rare, predicted deleterious, coding and noncoding single-nucleotide variants (SNVs), and insertion/deletions from WGS. We sequenced two unrelated probands with 22q11DS, with contrasting clinical findings, and their unaffected parents. Proband P1 had cognitive impairment, psychotic episodes, anxiety, and tetralogy of Fallot (TOF), whereas proband P2 had juvenile rheumatoid arthritis but no other major clinical findings. In P1, we identified common variants in COMT and PRODH on 22q11.2 as well as rare potentially deleterious DNA variants in other behavioral/neurocognitive genes. We also identified a de novo SNV in ADNP2 (NM_014913.3:c.2243G>C), encoding a neuroprotective protein that may be involved in behavioral disorders. In P2, we identified a novel nonsynonymous SNV in ZFPM2 (NM_012082.3:c.1576C>T), a known causative gene for TOF, which may act as a protective variant downstream of TBX1, haploinsufficiency of which is responsible for congenital heart disease in individuals with 22q11DS.
Collapse
Affiliation(s)
- Jonathan H Chung
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| | - Jinlu Cai
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Barrie G Suskin
- Department of Obstetrics & Gynecology and Women's Health, Montefiore Medical Center, Bronx, New York
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| | - Karlene Coleman
- Children's Healthcare of Atlanta at Egleston, Atlanta, Georgia
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
204
|
Trynka G, Westra HJ, Slowikowski K, Hu X, Xu H, Stranger BE, Klein RJ, Han B, Raychaudhuri S. Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci. Am J Hum Genet 2015; 97:139-52. [PMID: 26140449 PMCID: PMC4572568 DOI: 10.1016/j.ajhg.2015.05.016] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 05/26/2015] [Indexed: 01/12/2023] Open
Abstract
Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci.
Collapse
Affiliation(s)
- Gosia Trynka
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK
| | - Harm-Jan Westra
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kamil Slowikowski
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Bioinformatics and Integrative Genomics, Harvard University, Cambridge, MA 02138, USA
| | - Xinli Hu
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Division of Health Sciences and Technology, Boston, MA 02115, USA
| | - Han Xu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA 02215, USA
| | - Barbara E Stranger
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Buhm Han
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Republic of Korea; Department of Medicine, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Soumya Raychaudhuri
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02446, USA; Partners Center for Personalized Genetic Medicine, Boston, MA 02446, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Institute of Inflammation and Repair, University of Manchester, Manchester M13 9PT, UK.
| |
Collapse
|
205
|
Li H, Chen H, Liu F, Ren C, Wang S, Bo X, Shu W. Functional annotation of HOT regions in the human genome: implications for human disease and cancer. Sci Rep 2015; 5:11633. [PMID: 26113264 PMCID: PMC4481521 DOI: 10.1038/srep11633] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 06/01/2015] [Indexed: 12/17/2022] Open
Abstract
Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.
Collapse
Affiliation(s)
- Hao Li
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Hebing Chen
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Feng Liu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Chao Ren
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Shengqi Wang
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Wenjie Shu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| |
Collapse
|
206
|
Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet 2015; 47:710-6. [PMID: 26053494 PMCID: PMC4485503 DOI: 10.1038/ng.3332] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 05/11/2015] [Indexed: 12/13/2022]
Abstract
Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate TCGA whole genome sequencing data of 436 patients from eight cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a novel method that adjusts for sample- and genomic locus-specific mutation rate, we identify recurrently mutated sites across cancer patients. Mutated regulatory sites include known sites in the TERT promoter and many novel sites, including a subset in proximity to cancer genes. In reporter assays, two novel sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a larger role for regulatory mutations in cancer than previously appreciated.
Collapse
|
207
|
Webber DM, MacLeod SL, Bamshad MJ, Shaw GM, Finnell RH, Shete SS, Witte JS, Erickson SW, Murphy LD, Hobbs C. Developments in our understanding of the genetic basis of birth defects. ACTA ACUST UNITED AC 2015; 103:680-91. [PMID: 26033863 DOI: 10.1002/bdra.23385] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Birth defects are a major cause of morbidity and mortality worldwide. There has been much progress in understanding the genetic basis of familial and syndromic forms of birth defects. However, the etiology of nonsydromic birth defects is not well-understood. Although there is still much work to be done, we have many of the tools needed to accomplish the task. Advances in next-generation sequencing have introduced a sea of possibilities, from disease-gene discovery to clinical screening and diagnosis. These advances have been fruitful in identifying a host of candidate disease genes, spanning the spectrum of birth defects. With the advent of CRISPR-Cas9 gene editing, researchers now have a precise tool for characterizing this genetic variation in model systems. Work in model organisms has also illustrated the importance of epigenetics in human development and birth defects etiology. Here we review past and current knowledge in birth defects genetics. We describe genotyping and sequencing methods for the detection and analysis of rare and common variants. We remark on the utility of model organisms and explore epigenetics in the context of structural malformation. We conclude by highlighting approaches that may provide insight into the complex genetics of birth defects.
Collapse
Affiliation(s)
- Daniel M Webber
- Division of Birth Defects Research, Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| | - Stewart L MacLeod
- Division of Birth Defects Research, Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| | - Michael J Bamshad
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington
| | - Gary M Shaw
- Stanford University School of Medicine, Stanford, California
| | - Richard H Finnell
- Dell Pediatric Research Institute, Department of Nutritional Sciences, The University of Texas at Austin, Austin, Texas
| | - Sanjay S Shete
- Department of Epidemiology, MD Anderson Cancer Center, Houston, Texas
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California
| | - Stephen W Erickson
- Department of Biostatistics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| | - Linda D Murphy
- Division of Birth Defects Research, Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| | - Charlotte Hobbs
- Division of Birth Defects Research, Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| |
Collapse
|
208
|
A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep 2015; 5:10576. [PMID: 26015273 PMCID: PMC4444969 DOI: 10.1038/srep10576] [Citation(s) in RCA: 127] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 04/20/2015] [Indexed: 12/16/2022] Open
Abstract
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu
Collapse
|
209
|
Cao H, Wu H, Luo R, Huang S, Sun Y, Tong X, Xie Y, Liu B, Yang H, Zheng H, Li J, Li B, Wang Y, Yang F, Sun P, Liu S, Gao P, Huang H, Sun J, Chen D, He G, Huang W, Huang Z, Li Y, Tellier LCAM, Liu X, Feng Q, Xu X, Zhang X, Bolund L, Krogh A, Kristiansen K, Drmanac R, Drmanac S, Nielsen R, Li S, Wang J, Yang H, Li Y, Wong GKS, Wang J. De novo assembly of a haplotype-resolved human genome. Nat Biotechnol 2015; 33:617-22. [PMID: 26006006 DOI: 10.1038/nbt.3200] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 03/16/2015] [Indexed: 12/27/2022]
Abstract
The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.
Collapse
Affiliation(s)
- Hongzhi Cao
- 1] BGI-Shenzhen, Shenzhen, China. [2] BGI-Tianjin, Tianjin, China. [3] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Honglong Wu
- 1] BGI-Shenzhen, Shenzhen, China. [2] BGI-Tianjin, Tianjin, China
| | - Ruibang Luo
- 1] BGI-Shenzhen, Shenzhen, China. [2] HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, Hong Kong, China
| | - Shujia Huang
- 1] BGI-Shenzhen, Shenzhen, China. [2] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | - Yuhui Sun
- 1] BGI-Shenzhen, Shenzhen, China. [2] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | | | - Yinlong Xie
- 1] BGI-Shenzhen, Shenzhen, China. [2] HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, Hong Kong, China. [3] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | - Binghang Liu
- 1] BGI-Shenzhen, Shenzhen, China. [2] HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, Hong Kong, China
| | | | - Hancheng Zheng
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jian Li
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Bo Li
- BGI-Shenzhen, Shenzhen, China
| | - Yu Wang
- 1] BGI-Shenzhen, Shenzhen, China. [2] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | | | | | - Siyang Liu
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Haodong Huang
- 1] BGI-Shenzhen, Shenzhen, China. [2] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | | | | | | | | | | | - Yue Li
- BGI-Shenzhen, Shenzhen, China
| | - Laurent C A M Tellier
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Xiao Liu
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Qiang Feng
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, China
| | | | - Lars Bolund
- 1] BGI-Shenzhen, Shenzhen, China. [2] Institute of Biomedicine, University of Aarhus, Aarhus, Denmark. [3] Danish Center for Translational Breast Cancer Research, Copenhagen, Denmark
| | - Anders Krogh
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Karsten Kristiansen
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Rasmus Nielsen
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Integrative Biology, University of California, Berkeley, California, USA. [3] Department of Statistics, University of California, Berkeley, California, USA
| | | | - Jian Wang
- 1] BGI-Shenzhen, Shenzhen, China. [2] James D. Watson Institute of Genome Sciences, Hangzhou, China
| | - Huanming Yang
- 1] BGI-Shenzhen, Shenzhen, China. [2] James D. Watson Institute of Genome Sciences, Hangzhou, China. [3] Princess Al Jawhara Albrahim Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Yingrui Li
- 1] BGI-Shenzhen, Shenzhen, China. [2] Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Gane Ka-Shu Wong
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada. [3] Department of Medicine, University of Alberta, Edmonton, Alberta, Canada
| | - Jun Wang
- 1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark. [3] Princess Al Jawhara Albrahim Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia. [4] Macau University of Science and Technology, Taipa, Macau, China. [5] Department of Medicine and State Key Laboratory of Pharmaceutical Biotechnology, University of Hong Kong, Hong Kong, China
| |
Collapse
|
210
|
Kim K, Yang W, Lee KS, Bang H, Jang K, Kim SC, Yang JO, Park S, Park K, Choi JK. Global transcription network incorporating distal regulator binding reveals selective cooperation of cancer drivers and risk genes. Nucleic Acids Res 2015; 43:5716-29. [PMID: 26001967 PMCID: PMC4499150 DOI: 10.1093/nar/gkv532] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 05/09/2015] [Indexed: 01/04/2023] Open
Abstract
Global network modeling of distal regulatory interactions is essential in understanding the overall architecture of gene expression programs. Here, we developed a Bayesian probabilistic model and computational method for global causal network construction with breast cancer as a model. Whereas physical regulator binding was well supported by gene expression causality in general, distal elements in intragenic regions or loci distant from the target gene exhibited particularly strong functional effects. Modeling the action of long-range enhancers was critical in recovering true biological interactions with increased coverage and specificity overall and unraveling regulatory complexity underlying tumor subclasses and drug responses in particular. Transcriptional cancer drivers and risk genes were discovered based on the network analysis of somatic and genetic cancer-related DNA variants. Notably, we observed that the risk genes were functionally downstream of the cancer drivers and were selectively susceptible to network perturbation by tumorigenic changes in their upstream drivers. Furthermore, cancer risk alleles tended to increase the susceptibility of the transcription of their associated genes. These findings suggest that transcriptional cancer drivers selectively induce a combinatorial misregulation of downstream risk genes, and that genetic risk factors, mostly residing in distal regulatory regions, increase transcriptional susceptibility to upstream cancer-driving somatic changes.
Collapse
Affiliation(s)
- Kwoneel Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| | - Woojin Yang
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| | - Kang Seon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| | - Hyoeun Bang
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| | - Kiwon Jang
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| | - Sang Cheol Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul 135-710, Republic of Korea
| | - Jin Ok Yang
- Korean Bioinformation Center, KRIBB, Daejeon 305-806, Republic of Korea
| | - Seongjin Park
- Korean Bioinformation Center, KRIBB, Daejeon 305-806, Republic of Korea
| | - Kiejung Park
- Korean Bioinformation Center, KRIBB, Daejeon 305-806, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Republic of Korea
| |
Collapse
|
211
|
GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 2015; 348:648-60. [PMID: 25954001 PMCID: PMC4547484 DOI: 10.1126/science.1262110] [Citation(s) in RCA: 3778] [Impact Index Per Article: 377.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 04/03/2015] [Indexed: 12/11/2022]
Abstract
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
Collapse
Affiliation(s)
- GTEx Consortium
- Corresponding author: Kristin G. Ardlie () or Emmanouil T. Dermitzakis ()
| |
Collapse
|
212
|
Smyth C, Špakulová I, Cotton-Barratt O, Rafiq S, Tapper W, Upstill-Goddard R, Hopper JL, Makalic E, Schmidt DF, Kapuscinski M, Fliege J, Collins A, Brodzki J, Eccles DM, MacArthur BD. Quantifying the cumulative effect of low-penetrance genetic variants on breast cancer risk. Mol Genet Genomic Med 2015; 3:182-8. [PMID: 26029704 PMCID: PMC4444159 DOI: 10.1002/mgg3.129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 11/28/2014] [Accepted: 12/04/2014] [Indexed: 11/24/2022] Open
Abstract
Many common diseases have a complex genetic basis in which large numbers of genetic variations combine with environmental factors to determine risk. However, quantifying such polygenic effects has been challenging. In order to address these difficulties we developed a global measure of the information content of an individual's genome relative to a reference population, which may be used to assess differences in global genome structure between cases and appropriate controls. Informally this measure, which we call relative genome information (RGI), quantifies the relative "disorder" of an individual's genome. In order to test its ability to predict disease risk we used RGI to compare single-nucleotide polymorphism genotypes from two independent samples of women with early-onset breast cancer with three independent sets of controls. We found that RGI was significantly elevated in both sets of breast cancer cases in comparison with all three sets of controls, with disease risk rising sharply with RGI. Furthermore, these differences are not due to associations with common variants at a small number of disease-associated loci, but rather are due to the combined associations of thousands of markers distributed throughout the genome. Our results indicate that the information content of an individual's genome may be used to measure the risk of a complex disease, and suggest that early-onset breast cancer has a strongly polygenic component.
Collapse
Affiliation(s)
- Conor Smyth
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| | - Iva Špakulová
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| | - Owen Cotton-Barratt
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| | - Sajjad Rafiq
- Cancer Sciences Academic Unit and University of Southampton Clinical Trials Unit, Faculty of Medicine, University of Southampton and University Hospital Southampton Foundation Trust Tremona Road, Southampton, SO16 6YA, United Kingdom
| | - William Tapper
- Human Genetics, Faculty of Medicine, University of Southampton Tremona Road, Southampton, SO16 6YA, United Kingdom
| | - Rosanna Upstill-Goddard
- Human Genetics, Faculty of Medicine, University of Southampton Tremona Road, Southampton, SO16 6YA, United Kingdom
| | - John L Hopper
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population and Global Health, The University of Melbourne Carlton, Victoria, Australia
| | - Enes Makalic
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population and Global Health, The University of Melbourne Carlton, Victoria, Australia
| | - Daniel F Schmidt
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population and Global Health, The University of Melbourne Carlton, Victoria, Australia
| | - Miroslav Kapuscinski
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of Population and Global Health, The University of Melbourne Carlton, Victoria, Australia
| | - Jörg Fliege
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| | - Andrew Collins
- Human Genetics, Faculty of Medicine, University of Southampton Tremona Road, Southampton, SO16 6YA, United Kingdom
| | - Jacek Brodzki
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| | - Diana M Eccles
- Cancer Sciences Academic Unit and University of Southampton Clinical Trials Unit, Faculty of Medicine, University of Southampton and University Hospital Southampton Foundation Trust Tremona Road, Southampton, SO16 6YA, United Kingdom
| | - Ben D MacArthur
- Mathematical Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom ; Human Development and Health, Faculty of Medicine, University of Southampton Tremona Road, Southampton, SO16 6YA, United Kingdom ; Institute for Life Sciences, University of Southampton Southampton, SO17 1BJ, United Kingdom
| |
Collapse
|
213
|
Lill CM, Rengmark A, Pihlstrøm L, Fogh I, Shatunov A, Sleiman PM, Wang LS, Liu T, Lassen CF, Meissner E, Alexopoulos P, Calvo A, Chio A, Dizdar N, Faltraco F, Forsgren L, Kirchheiner J, Kurz A, Larsen JP, Liebsch M, Linder J, Morrison KE, Nissbrandt H, Otto M, Pahnke J, Partch A, Restagno G, Rujescu D, Schnack C, Shaw CE, Shaw PJ, Tumani H, Tysnes OB, Valladares O, Silani V, van den Berg LH, van Rheenen W, Veldink JH, Lindenberger U, Steinhagen-Thiessen E, Teipel S, Perneczky R, Hakonarson H, Hampel H, von Arnim CAF, Olsen JH, Van Deerlin VM, Al-Chalabi A, Toft M, Ritz B, Bertram L. The role of TREM2 R47H as a risk factor for Alzheimer's disease, frontotemporal lobar degeneration, amyotrophic lateral sclerosis, and Parkinson's disease. Alzheimers Dement 2015; 11:1407-1416. [PMID: 25936935 DOI: 10.1016/j.jalz.2014.12.009] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 10/22/2014] [Accepted: 12/02/2014] [Indexed: 01/12/2023]
Abstract
A rare variant in TREM2 (p.R47H, rs75932628) was recently reported to increase the risk of Alzheimer's disease (AD) and, subsequently, other neurodegenerative diseases, i.e. frontotemporal lobar degeneration (FTLD), amyotrophic lateral sclerosis (ALS), and Parkinson's disease (PD). Here we comprehensively assessed TREM2 rs75932628 for association with these diseases in a total of 19,940 previously untyped subjects of European descent. These data were combined with those from 28 published data sets by meta-analysis. Furthermore, we tested whether rs75932628 shows association with amyloid beta (Aβ42) and total-tau protein levels in the cerebrospinal fluid (CSF) of 828 individuals with AD or mild cognitive impairment. Our data show that rs75932628 is highly significantly associated with the risk of AD across 24,086 AD cases and 148,993 controls of European descent (odds ratio or OR = 2.71, P = 4.67 × 10(-25)). No consistent evidence for association was found between this marker and the risk of FTLD (OR = 2.24, P = .0113 across 2673 cases/9283 controls), PD (OR = 1.36, P = .0767 across 8311 cases/79,938 controls) and ALS (OR = 1.41, P = .198 across 5544 cases/7072 controls). Furthermore, carriers of the rs75932628 risk allele showed significantly increased levels of CSF-total-tau (P = .0110) but not Aβ42 suggesting that TREM2's role in AD may involve tau dysfunction.
Collapse
Affiliation(s)
- Christina M Lill
- Platform for Genome Analytics, Institutes of Neurogenetics & Integrative and Experimental Genomics, University of Lübeck, Lübeck, Germany; Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | - Aina Rengmark
- Department of Neurology, Oslo University Hospital, Oslo, Norway
| | - Lasse Pihlstrøm
- Department of Neurology, Oslo University Hospital, Oslo, Norway
| | - Isabella Fogh
- Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, UK
| | - Aleksey Shatunov
- Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, UK
| | - Patrick M Sleiman
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Human Genetics, Abramson Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Tian Liu
- Max Planck Institute for Human Development, Berlin, Germany
| | - Christina F Lassen
- Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark
| | - Esther Meissner
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Panos Alexopoulos
- Department of Psychiatry and Psychotherapy, Technische Universität München, Munich, Germany
| | - Andrea Calvo
- Rita Levi Montalcini Department of Neuroscience, ALS Center, University of Torino, Torino, Italy
| | - Adriano Chio
- Rita Levi Montalcini Department of Neuroscience, ALS Center, University of Torino, Torino, Italy; Neuroscience Institute of Turin, Turin, Italy
| | - Nil Dizdar
- Department of Neurology, Linköping University, Linköping, Sweden
| | - Frank Faltraco
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Goethe University of Frankfurt, Frankfurt, Germany
| | - Lars Forsgren
- Department of Pharmacology and Clinical Neuroscience, Umeå University, Umeå, Sweden
| | | | - Alexander Kurz
- Department of Psychiatry and Psychotherapy, Technische Universität München, Munich, Germany
| | - Jan P Larsen
- The Norwegian Centre for Movement Disorders, Stavanger University Hospital, Stavanger, Norway
| | - Maria Liebsch
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Jan Linder
- Department of Pharmacology and Clinical Neuroscience, Umeå University, Umeå, Sweden
| | - Karen E Morrison
- School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK; Neurosciences Division, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Hans Nissbrandt
- Department of Pharmacology, The Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Markus Otto
- Department of Neurology, University of Ulm, Ulm, Germany
| | - Jens Pahnke
- Department of Neuro-/Pathology, University of Oslo and Oslo University Hospital, Oslo, Norway; Lübeck Institute of Experimental Dermatology, University of Lübeck, Lübeck, Germany
| | - Amanda Partch
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Gabriella Restagno
- Department of Clinical Pathology, Molecular Genetics Unit, Azienda Ospedaliera Città della Salute e della Scienza, Torino, Italy
| | - Dan Rujescu
- Department of Psychiatry, University of Halle-Wittenberg, Halle, Germany
| | | | - Christopher E Shaw
- Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, UK
| | - Pamela J Shaw
- Department of Neuroscience, Sheffield Institute for Translational Neuroscience (SITraN), Faculty of Medicine, Dentistry and Health, University of Sheffield, Sheffield, UK
| | | | - Ole-Bjørn Tysnes
- Department of Neurology, Haukeland University Hospital, Bergen, Norway; Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Vincenzo Silani
- Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milano, Italy; Department of Pathophysiology and Tranplantation, "Dino Ferrari" Center, Università degli Studi di Milano, Milano, Italy
| | - Leonard H van den Berg
- Department of Neurology, Neuromuscular Diseases Brain Center Rudolf Magnus, Netherlands ALS Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Wouter van Rheenen
- Department of Neurology, Neuromuscular Diseases Brain Center Rudolf Magnus, Netherlands ALS Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Jan H Veldink
- Department of Neurology, Neuromuscular Diseases Brain Center Rudolf Magnus, Netherlands ALS Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | - Stefan Teipel
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany; Department of Psychosomatic Medicine, University of Rostock, Rostock, Germany
| | - Robert Perneczky
- Department of Psychiatry and Psychotherapy, Technische Universität München, Munich, Germany; Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, The Imperial College of Science, Technology, and Medicine, London, UK; West London Cognitive Disorders Treatment and Research Unit, West London Mental Health Trust, London, UK
| | - Hakon Hakonarson
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Human Genetics, Abramson Research Center, The Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Harald Hampel
- AXA Research Fund & UPMC Chair, Paris, France; Département de Neurologie, Sorbonne Universités, Université Pierre et Marie Curie, Institut de la Mémoire et de la Maladie d'Alzheimer & Institut du Cerveau et de la Moelle épinière (ICM), Hôpital de la Pitié-Salpétrière, Paris, France
| | | | - Jørgen H Olsen
- Institute of Cancer Epidemiology, Danish Cancer Society, Copenhagen, Denmark
| | - Vivianna M Van Deerlin
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Ammar Al-Chalabi
- Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, UK
| | - Mathias Toft
- Department of Neurology, Oslo University Hospital, Oslo, Norway
| | - Beate Ritz
- Department of Epidemiology and Environmental Sciences, School of Public Health, University of California, Los Angeles, CA, USA
| | - Lars Bertram
- Platform for Genome Analytics, Institutes of Neurogenetics & Integrative and Experimental Genomics, University of Lübeck, Lübeck, Germany; Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany; Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, The Imperial College of Science, Technology, and Medicine, London, UK
| |
Collapse
|
214
|
Magi A, D'Aurizio R, Palombo F, Cifola I, Tattini L, Semeraro R, Pippucci T, Giusti B, Romeo G, Abbate R, Gensini GF. Characterization and identification of hidden rare variants in the human genome. BMC Genomics 2015; 16:340. [PMID: 25903059 PMCID: PMC4416239 DOI: 10.1186/s12864-015-1481-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 03/23/2015] [Indexed: 12/11/2022] Open
Abstract
Background By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state. Results We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways. Conclusions These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy.
| | - Flavia Palombo
- Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
| | - Ingrid Cifola
- Institute for Biomedical Technologies, National Research Council, Milan, Italy.
| | - Lorenzo Tattini
- Department of Neuroscience, Pharmacology and Child Health, University of Florence, Florence, Italy.
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
| | - Tommaso Pippucci
- Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
| | - Betti Giusti
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
| | - Giovanni Romeo
- Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
| | - Rosanna Abbate
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
| | - Gian Franco Gensini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
| |
Collapse
|
215
|
Mathelier A, Lefebvre C, Zhang AW, Arenillas DJ, Ding J, Wasserman WW, Shah SP. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biol 2015; 16:84. [PMID: 25903198 PMCID: PMC4467049 DOI: 10.1186/s13059-015-0648-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 04/07/2015] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. RESULTS We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. CONCLUSIONS Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Collapse
Affiliation(s)
- Anthony Mathelier
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Calvin Lefebvre
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, V5Z 1L3, BC, Canada.
| | - Allen W Zhang
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, V5Z 1L3, BC, Canada.
| | - David J Arenillas
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Jiarui Ding
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Department of Computer Science, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada.
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Sohrab P Shah
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, G227-2211, BC, Canada.
| |
Collapse
|
216
|
Zhang Y, Liu ZL, Song M. ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion. Nucleic Acids Res 2015; 43:4393-407. [PMID: 25897127 PMCID: PMC4482087 DOI: 10.1093/nar/gkv358] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 04/06/2015] [Indexed: 12/14/2022] Open
Abstract
Analysis of rewired upstream subnetworks impacting downstream differential gene expression aids the delineation of evolving molecular mechanisms. Cumulative statistics based on conventional differential correlation are limited for subnetwork rewiring analysis since rewiring is not necessarily equivalent to change in correlation coefficients. Here we present a computational method ChiNet to quantify subnetwork rewiring by statistical heterogeneity that enables detection of potential genotype changes causing altered transcription regulation in evolving organisms. Given a differentially expressed downstream gene set, ChiNet backtracks a rewired upstream subnetwork from a super-network including gene interactions known to occur under various molecular contexts. We benchmarked ChiNet for its high accuracy in distinguishing rewired artificial subnetworks, in silico yeast transcription-metabolic subnetworks, and rewired transcription subnetworks for Candida albicans versus Saccharomyces cerevisiae, against two differential-correlation based subnetwork rewiring approaches. Then, using transcriptome data from tolerant S. cerevisiae strain NRRL Y-50049 and a wild-type intolerant strain, ChiNet identified 44 metabolic pathways affected by rewired transcription subnetworks anchored to major adaptively activated transcription factor genes YAP1, RPN4, SFP1 and ROX1, in response to toxic chemical challenges involved in lignocellulose-to-biofuels conversion. These findings support the use of ChiNet in rewiring analysis of subnetworks where differential interaction patterns resulting from divergent nonlinear dynamics abound.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Z Lewis Liu
- National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
217
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
218
|
Fung JN, Rogers PA, Montgomery GW. Identifying the Biological Basis of GWAS Hits for Endometriosis1. Biol Reprod 2015; 92:87. [DOI: 10.1095/biolreprod.114.126458] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/05/2015] [Indexed: 12/18/2022] Open
|
219
|
Re-annotation of presumed noncoding disease/trait-associated genetic variants by integrative analyses. Sci Rep 2015; 5:9453. [PMID: 25819875 PMCID: PMC4377585 DOI: 10.1038/srep09453] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 03/02/2015] [Indexed: 11/08/2022] Open
Abstract
Using RefSeq annotations, most disease/trait-associated genetic variants identified by genome-wide association studies (GWAS) appear to be located within intronic or intergenic regions, which makes it difficult to interpret their functions. We reassessed GWAS-Associated single-nucleotide polymorphisms (herein termed as GASs) for their potential functionalities using integrative approaches. 8834 of 9184 RefSeq “noncoding” GASs were reassessed to have potential regulatory functionalities. As examples, 3 variants (rs3130320, rs3806932 and rs6890853) were shown to have regulatory properties in HepG2, A549 and 293T cells. Except rs3130320 as a known expression quantitative trait loci (eQTL), rs3806932 and rs6890853 were not reported as eQTLs in previous reports. 1999 of 9184 “noncoding” GASs were re-annotated to the promoters or intragenic regions using Ensembl, UCSC and AceView gene annotations but they were not annotated into corresponding regions in RefSeq database. Moreover, these GAS-harboring genes were broadly expressed across different tissues and a portion of them was expressed in a tissue-specific manner, suggesting that they could be functional. Collectively, our study demonstrates the benefits of using integrative analyses to interpret genetic variants and may help to predict or explain disease susceptibility more accurately and comprehensively.
Collapse
|
220
|
Identification of a large set of rare complete human knockouts. Nat Genet 2015; 47:448-52. [PMID: 25807282 DOI: 10.1038/ng.3243] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 02/13/2015] [Indexed: 12/17/2022]
Abstract
Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10-261).
Collapse
|
221
|
Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 2015; 47:435-44. [PMID: 25807286 DOI: 10.1038/ng.3247] [Citation(s) in RCA: 562] [Impact Index Per Article: 56.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 02/13/2015] [Indexed: 11/09/2022]
Abstract
Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Collapse
|
222
|
Wu L, Snyder M. Impact of allele-specific peptides in proteome quantification. Proteomics Clin Appl 2015; 9:432-6. [PMID: 25676416 DOI: 10.1002/prca.201400126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 01/03/2015] [Accepted: 02/05/2015] [Indexed: 11/06/2022]
Abstract
MS-based proteome technologies have greatly improved our ability to detect and quantify proteomes across various biological samples. High throughput bottom-up proteome profiling in combination with targeted MS method, e.g. SRM assay, is emerging as a powerful approach in the field of biomarker discovery. In the past few years, increasing number of studies have attempted to integrate genomic and proteomic data for biomarker discovery. Here, we describe how allele-specific peptide can be applied in biomarker discovery and their impact in protein quantification.
Collapse
Affiliation(s)
- Linfeng Wu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA; Caprion Proteomics US LLC, Menlo Park, CA, USA
| | | |
Collapse
|
223
|
Garritano S, Romanel A, Ciribilli Y, Bisio A, Gavoci A, Inga A, Demichelis F. In-silico identification and functional validation of allele-dependent AR enhancers. Oncotarget 2015; 6:4816-28. [PMID: 25693204 PMCID: PMC4467117 DOI: 10.18632/oncotarget.3019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 12/30/2014] [Indexed: 12/13/2022] Open
Abstract
Androgen Receptor (AR) and Estrogen Receptors (ERs) are key nuclear receptors that can cooperate in orchestrating gene expression programs in multiple tissues and diseases, targeting binding elements in promoters and distant enhancers. We report the unbiased identification of enhancer elements bound by AR and ER-α whose activity can be allele-specific depending on the status of nearby Single Nucleotide Polymorphisms (SNP). ENCODE data were computationally mined to nominate genomic loci with: (i) chromatin signature of enhancer activity from activation histone marks, (ii) binding evidence by AR and ER-α, (iii) presence of a SNP. Forty-one loci were identified and two, on 1q21.3 and 13q34, selected for characterization by gene reporter, Chromatin immunoprecipitation (ChIP) and RT-qPCR assays in breast (MCF7) and prostate (PC-3) cancer-derived cell lines. We observed allele-specific enhancer activity, responsiveness to ligand-bound AR, and potentially influence on the transcription of closely located genes (RAB20, ING1, ARHGEF7, ADAM15). The 1q21.3 variant, rs2242193, showed impact on AR binding in MCF7 cells that are heterozygous for the SNP. Our unbiased genome-wide search proved to be an efficient methodology to discover new functional polymorphic regulatory regions (PRR) potentially acting as risk modifiers in hormone-driven cancers and overall nominated SNPs in PRR across 136 transcription factors.
Collapse
MESH Headings
- Alleles
- Blotting, Western
- Breast Neoplasms/genetics
- Breast Neoplasms/metabolism
- Breast Neoplasms/pathology
- Chromatin Immunoprecipitation
- Computer Simulation
- Enhancer Elements, Genetic/genetics
- Estrogen Receptor alpha/genetics
- Estrogen Receptor alpha/metabolism
- Female
- Gene Expression Regulation, Neoplastic
- Genome, Human
- Humans
- Male
- Polymorphism, Single Nucleotide/genetics
- Promoter Regions, Genetic/genetics
- Prostatic Neoplasms/genetics
- Prostatic Neoplasms/metabolism
- Prostatic Neoplasms/pathology
- RNA, Messenger/genetics
- Real-Time Polymerase Chain Reaction
- Receptors, Androgen/genetics
- Receptors, Androgen/metabolism
- Reverse Transcriptase Polymerase Chain Reaction
- Tumor Cells, Cultured
Collapse
Affiliation(s)
- Sonia Garritano
- Laboratory of Computational Oncology, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Alessandro Romanel
- Laboratory of Computational Oncology, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Yari Ciribilli
- Laboratory of Transcriptional Networks, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Alessandra Bisio
- Laboratory of Transcriptional Networks, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Antoneta Gavoci
- Laboratory of Computational Oncology, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Alberto Inga
- Laboratory of Transcriptional Networks, CIBIO, Centre for Integrative Biology, University of Trento, Italy
| | - Francesca Demichelis
- Laboratory of Computational Oncology, CIBIO, Centre for Integrative Biology, University of Trento, Italy
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, NY, USA
- Institute for Precision Medicine, Weill Medical College of Cornell University and New York Presbyterian Hospital, New York, NY, USA
| |
Collapse
|
224
|
Hussin JG, Hodgkinson A, Idaghdour Y, Grenier JC, Goulet JP, Gbeha E, Hip-Ki E, Awadalla P. Recombination affects accumulation of damaging and disease-associated mutations in human populations. Nat Genet 2015; 47:400-4. [DOI: 10.1038/ng.3216] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 01/14/2015] [Indexed: 01/17/2023]
|
225
|
Aure MR, Jernström S, Krohn M, Vollan HKM, Due EU, Rødland E, Kåresen R, Ram P, Lu Y, Mills GB, Sahlberg KK, Børresen-Dale AL, Lingjærde OC, Kristensen VN. Integrated analysis reveals microRNA networks coordinately expressed with key proteins in breast cancer. Genome Med 2015; 7:21. [PMID: 25873999 PMCID: PMC4396592 DOI: 10.1186/s13073-015-0135-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 01/19/2015] [Indexed: 01/20/2023] Open
Abstract
Background The role played by microRNAs in the deregulation of protein expression in breast cancer is only partly understood. To gain insight, the combined effect of microRNA and mRNA expression on protein expression was investigated in three independent data sets. Methods Protein expression was modeled as a multilinear function of powers of mRNA and microRNA expression. The model was first applied to mRNA and protein expression for 105 selected cancer-associated genes and to genome-wide microRNA expression from 283 breast tumors. The model considered both the effect of one microRNA at a time and all microRNAs combined. In the latter case the Lasso penalized regression method was applied to detect the simultaneous effect of multiple microRNAs. Results An interactome map for breast cancer representing all direct and indirect associations between the expression of microRNAs and proteins was derived. A pattern of extensive coordination between microRNA and protein expression in breast cancer emerges, with multiple clusters of microRNAs being associated with multiple clusters of proteins. Results were subsequently validated in two independent breast cancer data sets. A number of the microRNA-protein associations were functionally validated in a breast cancer cell line. Conclusions A comprehensive map is derived for the co-expression in breast cancer of microRNAs and 105 proteins with known roles in cancer, after filtering out the in-cis effect of mRNA expression. The analysis suggests that group action by several microRNAs to deregulate the expression of proteins is a common modus operandi in breast cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0135-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Miriam Ragle Aure
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | - Sandra Jernström
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | - Marit Krohn
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | - Hans Kristian Moen Vollan
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway ; Department of Oncology, Division of Surgery, Cancer and Transplantation, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway
| | - Eldri U Due
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | - Einar Rødland
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; Centre for Cancer Biomedicine, University of Oslo, Oslo, 0316 Norway ; Department of Computer Science, University of Oslo, Oslo, 0316 Norway
| | - Rolf Kåresen
- Institute of Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | | | - Prahlad Ram
- Department of Systems Biology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 USA
| | - Yiling Lu
- Department of Systems Biology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 USA
| | - Gordon B Mills
- Department of Systems Biology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 USA
| | - Kristine Kleivi Sahlberg
- K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway ; Department of Research, Vestre Viken Hospital Trust, Drammen, 3004 Norway
| | - Anne-Lise Børresen-Dale
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway
| | - Ole Christian Lingjærde
- K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway ; Centre for Cancer Biomedicine, University of Oslo, Oslo, 0316 Norway ; Department of Computer Science, University of Oslo, Oslo, 0316 Norway
| | - Vessela N Kristensen
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, 0310 Norway ; K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, 0316 Norway ; Department of Clinical Molecular Biology and Laboratory Science (EpiGen), Division of Medicine, Akershus University Hospital, Lørenskog, 1478 Norway
| |
Collapse
|
226
|
Mathelier A, Shi W, Wasserman WW. Identification of altered cis-regulatory elements in human disease. Trends Genet 2015; 31:67-76. [DOI: 10.1016/j.tig.2014.12.003] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 12/19/2014] [Accepted: 12/19/2014] [Indexed: 02/01/2023]
|
227
|
Narzisi G, Schatz MC. The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol 2015; 3:8. [PMID: 25674564 PMCID: PMC4306302 DOI: 10.3389/fbioe.2015.00008] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 01/07/2015] [Indexed: 12/30/2022] Open
Abstract
Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.
Collapse
Affiliation(s)
| | - Michael C Schatz
- Cold Spring Harbor Laboratory, Simons Center for Quantitative Biology, Cold Spring Harbor , New York, NY , USA
| |
Collapse
|
228
|
Pon JR, Marra MA. Driver and Passenger Mutations in Cancer. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2015; 10:25-50. [DOI: 10.1146/annurev-pathol-012414-040312] [Citation(s) in RCA: 216] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Julia R. Pon
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
| | - Marco A. Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada V6T 1Z4;
| |
Collapse
|
229
|
Reimand J, Wagih O, Bader GD. Evolutionary constraint and disease associations of post-translational modification sites in human genomes. PLoS Genet 2015; 11:e1004919. [PMID: 25611800 PMCID: PMC4303425 DOI: 10.1371/journal.pgen.1004919] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 11/24/2014] [Indexed: 12/14/2022] Open
Abstract
Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM) sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks) in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least ∼180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.
Collapse
Affiliation(s)
- Jüri Reimand
- The Donnelly Centre, University of Toronto, Canada
- * E-mail: (JR); (GDB)
| | - Omar Wagih
- The Donnelly Centre, University of Toronto, Canada
| | - Gary D. Bader
- The Donnelly Centre, University of Toronto, Canada
- * E-mail: (JR); (GDB)
| |
Collapse
|
230
|
Koues OI, Kowalewski RA, Chang LW, Pyfrom SC, Schmidt JA, Luo H, Sandoval LE, Hughes TB, Bednarski JJ, Cashen AF, Payton JE, Oltz EM. Enhancer sequence variants and transcription-factor deregulation synergize to construct pathogenic regulatory circuits in B-cell lymphoma. Immunity 2015; 42:186-98. [PMID: 25607463 PMCID: PMC4302272 DOI: 10.1016/j.immuni.2014.12.021] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 10/11/2014] [Accepted: 11/17/2014] [Indexed: 01/06/2023]
Abstract
Most B-cell lymphomas arise in the germinal center (GC), where humoral immune responses evolve from potentially oncogenic cycles of mutation, proliferation, and clonal selection. Although lymphoma gene expression diverges significantly from GC B cells, underlying mechanisms that alter the activities of corresponding regulatory elements (REs) remain elusive. Here we define the complete pathogenic circuitry of human follicular lymphoma (FL), which activates or decommissions REs from normal GC B cells and commandeers enhancers from other lineages. Moreover, independent sets of transcription factors, whose expression was deregulated in FL, targeted commandeered versus decommissioned REs. Our approach revealed two distinct subtypes of low-grade FL, whose pathogenic circuitries resembled GC B or activated B cells. FL-altered enhancers also were enriched for sequence variants, including somatic mutations, which disrupt transcription-factor binding and expression of circuit-linked genes. Thus, the pathogenic regulatory circuitry of FL reveals distinct genetic and epigenetic etiologies for GC B-cell transformation.
Collapse
Affiliation(s)
- Olivia I Koues
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Rodney A Kowalewski
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Li-Wei Chang
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Sarah C Pyfrom
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jennifer A Schmidt
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hong Luo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Luis E Sandoval
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Tyler B Hughes
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jeffrey J Bednarski
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Amanda F Cashen
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jacqueline E Payton
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | - Eugene M Oltz
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| |
Collapse
|
231
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
232
|
Li J, Jiang Y, Wang T, Chen H, Xie Q, Shao Q, Ran X, Xia K, Sun ZS, Wu J. mirTrios: an integrated pipeline for detection of de novo and rare inherited mutations from trios-based next-generation sequencing. J Med Genet 2015; 52:275-81. [PMID: 25596308 DOI: 10.1136/jmedgenet-2014-102656] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
OBJECTIVES Recently, several studies documented that de novo mutations (DNMs) play important roles in the aetiology of sporadic diseases. Next-generation sequencing (NGS) enables variant calling at single-base resolution on a genome-wide scale. However, accurate identification of DNMs from NGS data still remains a major challenge. We developed mirTrios, a web server, to accurately detect DNMs and rare inherited mutations from NGS data in sporadic diseases. METHODS The expectation-maximisation (EM) model was adopted to accurately identify DNMs from variant call files of a trio generated by GATK (Genome Analysis Toolkit). The GATK results, which contain certain basic properties (such as PL, PRT and PART), are iteratively integrated into the EM model to strike a threshold for DNMs detection. Training sets of true and false positive DNMs in the EM model were built from whole genome sequencing data of 64 trios. RESULTS With our in-house whole exome sequencing datasets from 20 trios, mirTrios totally identified 27 DNMs in the coding region, 25 of which (92.6%) are validated as true positives. In addition, to facilitate the interpretation of diverse mutations, mirTrios can also be employed in the identification of rare inherited mutations. Embedded with abundant annotation of DNMs and rare inherited mutations, mirTrios also supports known diagnostic variants and causative gene identification, as well as the prioritisation of novel and promising candidate genes. CONCLUSIONS mirTrios provides an intuitive interface for the general geneticist and clinician, and can be widely used for detection of DNMs and rare inherited mutations, and annotation in sporadic diseases. mirTrios is freely available at http://centre.bioinformatics.zj.cn/mirTrios/.
Collapse
Affiliation(s)
- Jinchen Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Tao Wang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Huiqian Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qing Xie
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qianzhi Shao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xia Ran
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Kun Xia
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Zhong Sheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jinyu Wu
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
233
|
Tian R, Basu MK, Capriotti E. ContrastRank: a new method for ranking putative cancer driver genes and classification of tumor samples. ACTA ACUST UNITED AC 2015; 30:i572-8. [PMID: 25161249 PMCID: PMC4147919 DOI: 10.1093/bioinformatics/btu466] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Motivation: The recent advance in high-throughput sequencing technologies is generating a huge amount of data that are becoming an important resource for deciphering the genotype underlying a given phenotype. Genome sequencing has been extensively applied to the study of the cancer genomes. Although a few methods have been already proposed for the detection of cancer-related genes, their automatic identification is still a challenging task. Using the genomic data made available by The Cancer Genome Atlas Consortium (TCGA), we propose a new prioritization approach based on the analysis of the distribution of putative deleterious variants in a large cohort of cancer samples. Results: In this paper, we present ContastRank, a new method for the prioritization of putative impaired genes in cancer. The method is based on the comparison of the putative defective rate of each gene in tumor versus normal and 1000 genome samples. We show that the method is able to provide a ranked list of putative impaired genes for colon, lung and prostate adenocarcinomas. The list significantly overlaps with the list of known cancer driver genes previously published. More importantly, by using our scoring approach, we can successfully discriminate between TCGA normal and tumor samples. A binary classifier based on ContrastRank score reaches an overall accuracy >90% and the area under the curve (AUC) of receiver operating characteristics (ROC) >0.95 for all the three types of adenocarcinoma analyzed in this paper. In addition, using ContrastRank score, we are able to discriminate the three tumor types with a minimum overall accuracy of 77% and AUC of 0.83. Conclusions: We describe ContrastRank, a method for prioritizing putative impaired genes in cancer. The method is based on the comparison of exome sequencing data from different cohorts and can detect putative cancer driver genes. ContrastRank can also be used to estimate a global score for an individual genome about the risk of adenocarcinoma based on the genetic variants information from a whole-exome VCF (Variant Calling Format) file. We believe that the application of ContrastRank can be an important step in genomic medicine to enable genome-based diagnosis. Availability and implementation: The lists of ContrastRank scores of all genes in each tumor type are available as supplementary materials. A webserver for evaluating the risk of the three studied adenocarcinomas starting from whole-exome VCF file is under development. Contact:emidio@uab.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Tian
- Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA
| | - Malay K Basu
- Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA
| | - Emidio Capriotti
- Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA Division of Informatics, Department of Pathology, Department of Clinical and Diagnostic Sciences and Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL 35249, USA
| |
Collapse
|
234
|
Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR, Morris Q, Barash Y, Krainer AR, Jojic N, Scherer SW, Blencowe BJ, Frey BJ. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 2015; 347:1254806. [PMID: 25525159 PMCID: PMC4362528 DOI: 10.1126/science.1254806] [Citation(s) in RCA: 809] [Impact Index Per Article: 80.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.
Collapse
Affiliation(s)
- Hui Y Xiong
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
| | - Babak Alipanahi
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
| | - Leo J Lee
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
| | - Hannes Bretschneider
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada
| | - Daniele Merico
- McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Ryan K C Yuen
- McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Yimin Hua
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Serge Gueroussov
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Hamed S Najafabadi
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
| | - Timothy R Hughes
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Quaid Morris
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Yoseph Barash
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Adrian R Krainer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Nebojsa Jojic
- eScience Group, Microsoft Research, Redmond, WA 98052, USA
| | - Stephen W Scherer
- Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Benjamin J Blencowe
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Brendan J Frey
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada. Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada. McLaughlin Centre, University of Toronto, Toronto, Ontario M5G 0A4, Canada. Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada. eScience Group, Microsoft Research, Redmond, WA 98052, USA.
| |
Collapse
|
235
|
Tan AS, Baty JW, Dong LF, Bezawork-Geleta A, Endaya B, Goodwin J, Bajzikova M, Kovarova J, Peterka M, Yan B, Pesdar EA, Sobol M, Filimonenko A, Stuart S, Vondrusova M, Kluckova K, Sachaphibulkij K, Rohlena J, Hozak P, Truksa J, Eccles D, Haupt LM, Griffiths LR, Neuzil J, Berridge MV. Mitochondrial genome acquisition restores respiratory function and tumorigenic potential of cancer cells without mitochondrial DNA. Cell Metab 2015; 21:81-94. [PMID: 25565207 DOI: 10.1016/j.cmet.2014.12.003] [Citation(s) in RCA: 564] [Impact Index Per Article: 56.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Revised: 07/10/2014] [Accepted: 12/09/2014] [Indexed: 10/24/2022]
Abstract
We report that tumor cells without mitochondrial DNA (mtDNA) show delayed tumor growth, and that tumor formation is associated with acquisition of mtDNA from host cells. This leads to partial recovery of mitochondrial function in cells derived from primary tumors grown from cells without mtDNA and a shorter lag in tumor growth. Cell lines from circulating tumor cells showed further recovery of mitochondrial respiration and an intermediate lag to tumor growth, while cells from lung metastases exhibited full restoration of respiratory function and no lag in tumor growth. Stepwise assembly of mitochondrial respiratory (super)complexes was correlated with acquisition of respiratory function. Our findings indicate horizontal transfer of mtDNA from host cells in the tumor microenvironment to tumor cells with compromised respiratory function to re-establish respiration and tumor-initiating efficacy. These results suggest pathophysiological processes for overcoming mtDNA damage and support the notion of high plasticity of malignant cells.
Collapse
Affiliation(s)
- An S Tan
- Malaghan Institute of Medical Research, P.O. Box 7060, Wellington 6242, New Zealand
| | - James W Baty
- Malaghan Institute of Medical Research, P.O. Box 7060, Wellington 6242, New Zealand
| | - Lan-Feng Dong
- School of Medical Science, Griffith University, Southport, QLD 4222, Australia
| | | | - Berwini Endaya
- School of Medical Science, Griffith University, Southport, QLD 4222, Australia
| | - Jacob Goodwin
- School of Medical Science, Griffith University, Southport, QLD 4222, Australia
| | - Martina Bajzikova
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Jaromira Kovarova
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Martin Peterka
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Bing Yan
- School of Medical Science, Griffith University, Southport, QLD 4222, Australia
| | | | - Margarita Sobol
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Anatolyj Filimonenko
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Shani Stuart
- Genomics Research Centre, Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, QLD 4059, Australia
| | - Magdalena Vondrusova
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Katarina Kluckova
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | | | - Jakub Rohlena
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Pavel Hozak
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - Jaroslav Truksa
- Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic
| | - David Eccles
- Malaghan Institute of Medical Research, P.O. Box 7060, Wellington 6242, New Zealand
| | - Larisa M Haupt
- Genomics Research Centre, Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, QLD 4059, Australia
| | - Lyn R Griffiths
- Genomics Research Centre, Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, QLD 4059, Australia
| | - Jiri Neuzil
- School of Medical Science, Griffith University, Southport, QLD 4222, Australia; Institute of Biotechnology, Academy of Sciences of the Czech Republic, Prague 142 20, Czech Republic.
| | - Michael V Berridge
- Malaghan Institute of Medical Research, P.O. Box 7060, Wellington 6242, New Zealand.
| |
Collapse
|
236
|
Stead LF, Thygesen H, Westhead DR, Rabbitts P. Using common variants to indicate cancer genes. Int J Cancer 2015; 136:241-5. [PMID: 24798945 PMCID: PMC4277321 DOI: 10.1002/ijc.28951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 04/02/2014] [Indexed: 12/03/2022]
Abstract
The catalogue of tumour-specific somatic mutations (SMs) is growing rapidly owing to the advent of next-generation sequencing. Identifying those mutations responsible for the development and progression of the disease, so-called driver mutations, will increase our understanding of carcinogenesis and provide candidates for targeted therapeutics. The phenotypic consequence(s) of driver mutations cause them to be selected for within the tumour environment, such that many approaches aimed at distinguishing drivers are based on finding significantly somatically mutated genes. Currently, these methods are designed to analyse, or be specifically applied to, nonsynonymous mutations: those that alter an encoded protein. However, growing evidence suggests the involvement of noncoding transcripts in carcinogenesis, mutations in which may also be disease-driving. We wished to test the hypothesis that common DNA variation rates within humans can be used as a baseline from which to score the rate of SMs, irrespective of coding capacity. We preliminarily tested this by applying it to a dataset of 159,498 SMs and using the results to rank genes. This resulted in significant enrichment of known cancer genes, indicating that the approach has merit. As additional data from cancer sequencing studies are made publicly available, this approach can be refined and applied to specific cancer subtypes. We named this preliminary version of our approach PRISMAD (polymorphism rates indicate somatic mutations as drivers) and have made it publicly accessible, with scripts, via a link at www.precancer.leeds.ac.uk/software-and-datasets.
Collapse
Affiliation(s)
- Lucy F Stead
- Leeds Institute of Cancer and Pathology, University of Leeds, St James's University Hospital, Leeds, United Kingdom
| | | | | | | |
Collapse
|
237
|
Dolan PT, Roth AP, Xue B, Sun R, Dunker AK, Uversky VN, LaCount DJ. Intrinsic disorder mediates hepatitis C virus core-host cell protein interactions. Protein Sci 2014; 24:221-35. [PMID: 25424537 DOI: 10.1002/pro.2608] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2014] [Accepted: 11/19/2014] [Indexed: 12/18/2022]
Abstract
Viral proteins bind to numerous cellular and viral proteins throughout the infection cycle. However, the mechanisms by which viral proteins interact with such large numbers of factors remain unknown. Cellular proteins that interact with multiple, distinct partners often do so through short sequences known as molecular recognition features (MoRFs) embedded within intrinsically disordered regions (IDRs). In this study, we report the first evidence that MoRFs in viral proteins play a similar role in targeting the host cell. Using a combination of evolutionary modeling, protein-protein interaction analyses and forward genetic screening, we systematically investigated two computationally predicted MoRFs within the N-terminal IDR of the hepatitis C virus (HCV) Core protein. Sequence analysis of the MoRFs showed their conservation across all HCV genotypes and the canine and equine Hepaciviruses. Phylogenetic modeling indicated that the Core MoRFs are under stronger purifying selection than the surrounding sequence, suggesting that these modules have a biological function. Using the yeast two-hybrid assay, we identified three cellular binding partners for each HCV Core MoRF, including two previously characterized cellular targets of HCV Core (DDX3X and NPM1). Random and site-directed mutagenesis demonstrated that the predicted MoRF regions were required for binding to the cellular proteins, but that different residues within each MoRF were critical for binding to different partners. This study demonstrated that viruses may use intrinsic disorder to target multiple cellular proteins with the same amino acid sequence and provides a framework for characterizing the binding partners of other disordered regions in viral and cellular proteomes.
Collapse
Affiliation(s)
- Patrick T Dolan
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana, 47907
| | | | | | | | | | | | | |
Collapse
|
238
|
Krishna A, Biryukov M, Trefois C, Antony PMA, Hussong R, Lin J, Heinäniemi M, Glusman G, Köglsberger S, Boyd O, van den Berg BHJ, Linke D, Huang D, Wang K, Hood L, Tholey A, Schneider R, Galas DJ, Balling R, May P. Systems genomics evaluation of the SH-SY5Y neuroblastoma cell line as a model for Parkinson's disease. BMC Genomics 2014; 15:1154. [PMID: 25528190 PMCID: PMC4367834 DOI: 10.1186/1471-2164-15-1154] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 12/12/2014] [Indexed: 12/20/2022] Open
Abstract
Background The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often used as a cellular model for Parkinson’s disease, the relevance of this cellular model in the context of Parkinson’s disease (PD) and other neurodegenerative diseases has not yet been systematically evaluated. Results We have used a systems genomics approach to characterize the SH-SY5Y cell line using whole-genome sequencing to determine the genetic content of the cell line and used transcriptomics and proteomics data to determine molecular correlations. Further, we integrated genomic variants using a network analysis approach to evaluate the suitability of the SH-SY5Y cell line for perturbation experiments in the context of neurodegenerative diseases, including PD. Conclusions The systems genomics approach showed consistency across different biological levels (DNA, RNA and protein concentrations). Most of the genes belonging to the major Parkinson’s disease pathways and modules were intact in the SH-SY5Y genome. Specifically, each analysed gene related to PD has at least one intact copy in SH-SY5Y. The disease-specific network analysis approach ranked the genetic integrity of SH-SY5Y as higher for PD than for Alzheimer’s disease but lower than for Huntington’s disease and Amyotrophic Lateral Sclerosis for loss of function perturbation experiments. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1154) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Abhimanyu Krishna
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
239
|
A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations. PLoS Genet 2014; 10:e1004819. [PMID: 25502805 PMCID: PMC4263371 DOI: 10.1371/journal.pgen.1004819] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 10/14/2014] [Indexed: 12/13/2022] Open
Abstract
Understanding the functional relevance of DNA variants is essential for all exome and genome sequencing projects. However, current mutagenesis cloning protocols require Sanger sequencing, and thus are prohibitively costly and labor-intensive. We describe a massively-parallel site-directed mutagenesis approach, "Clone-seq", leveraging next-generation sequencing to rapidly and cost-effectively generate a large number of mutant alleles. Using Clone-seq, we further develop a comparative interactome-scanning pipeline integrating high-throughput GFP, yeast two-hybrid (Y2H), and mass spectrometry assays to systematically evaluate the functional impact of mutations on protein stability and interactions. We use this pipeline to show that disease mutations on protein-protein interaction interfaces are significantly more likely than those away from interfaces to disrupt corresponding interactions. We also find that mutation pairs with similar molecular phenotypes in terms of both protein stability and interactions are significantly more likely to cause the same disease than those with different molecular phenotypes, validating the in vivo biological relevance of our high-throughput GFP and Y2H assays, and indicating that both assays can be used to determine candidate disease mutations in the future. The general scheme of our experimental pipeline can be readily expanded to other types of interactome-mapping methods to comprehensively evaluate the functional relevance of all DNA variants, including those in non-coding regions.
Collapse
|
240
|
Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res 2014; 43:e27. [PMID: 25477382 PMCID: PMC4344487 DOI: 10.1093/nar/gku1280] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The large collections of ChIP-seq data rapidly accumulating in public data warehouses provide genome-wide binding site maps for hundreds of transcription factors (TFs). However, the extent of the regulatory occupancy space in the human genome has not yet been fully apprehended by integrating public ChIP-seq data sets and combining it with ENCODE TFs map. To enable genome-wide identification of regulatory elements we have collected, analysed and retained 395 available ChIP-seq data sets merged with ENCODE peaks covering a total of 237 TFs. This enhanced repertoire complements and refines current genome-wide occupancy maps by increasing the human genome regulatory search space by 14% compared to ENCODE alone, and also increases the complexity of the regulatory dictionary. As a direct application we used this unified binding repertoire to annotate variant enhancer loci (VELs) from H3K4me1 mark in two cancer cell lines (MCF-7, CRC) and observed enrichments of specific TFs involved in biological key functions to cancer development and proliferation. Those enrichments of TFs within VELs provide a direct annotation of non-coding regions detected in cancer genomes. Finally, full access to this catalogue is available online together with the TFs enrichment analysis tool (http://tagc.univ-mrs.fr/remap/).
Collapse
Affiliation(s)
- Aurélien Griffon
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Quentin Barbier
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Jordi Dalino
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Jacques van Helden
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Salvatore Spicuglia
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Benoit Ballester
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| |
Collapse
|
241
|
Siepel A, Arbiza L. Cis-regulatory elements and human evolution. Curr Opin Genet Dev 2014; 29:81-9. [PMID: 25218861 PMCID: PMC4258466 DOI: 10.1016/j.gde.2014.08.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 08/17/2014] [Accepted: 08/23/2014] [Indexed: 11/20/2022]
Abstract
Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence, human polymorphism, and combinations of divergence and polymorphism. We then consider 'new frontiers' in this field stemming from recent research on transcriptional regulation.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | - Leonardo Arbiza
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
242
|
Wu L, Schaid DJ, Sicotte H, Wieben ED, Li H, Petersen GM. Case-only exome sequencing and complex disease susceptibility gene discovery: study design considerations. J Med Genet 2014; 52:10-6. [PMID: 25371537 DOI: 10.1136/jmedgenet-2014-102697] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Whole exome sequencing (WES) provides an unprecedented opportunity to identify the potential aetiological role of rare functional variants in human complex diseases. Large-scale collaborations have generated germline WES data on patients with a number of diseases, especially cancer, but less often on healthy controls under the same sequencing procedures. These data can be a valuable resource for identifying new disease susceptibility loci if study designs are appropriately applied. This review describes suggested strategies and technical considerations when focusing on case-only study designs that use WES data in complex disease scenarios. These include variant filtering based on frequency and functionality, gene prioritisation, interrogation of different data types and targeted sequencing validation. We propose that if case-only WES designs were applied in an appropriate manner, new susceptibility genes containing rare variants for human complex diseases can be detected.
Collapse
Affiliation(s)
- Lang Wu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA Center for Clinical and Translational Science, Mayo Clinic, Rochester, Minnesota, USA
| | - Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hugues Sicotte
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric D Wieben
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, USA
| | - Hu Li
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Gloria M Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
243
|
Abstract
Gene enhancer elements are noncoding segments of DNA that play a central role in regulating transcriptional programs that control development, cell identity, and evolutionary processes. Recent studies have shown that noncoding single nucleotide polymorphisms (SNPs) that have been associated with risk for numerous common diseases through genome-wide association studies frequently lie in cell-type-specific enhancer elements. These enhancer variants probably influence transcriptional output, thereby offering a mechanistic basis to explain their association with risk for many common diseases. This review focuses on the identification and interpretation of disease-susceptibility variants that influence enhancer function. We discuss strategies for prioritizing the study of functional enhancer SNPs over those likely to be benign, review experimental and computational approaches to identifying the gene targets of enhancer variants, and highlight efforts to quantify the impact of enhancer variants on target transcript levels and cellular phenotypes. These studies are beginning to provide insights into the mechanistic basis of many common diseases, as well as into how we might translate this knowledge for improved disease diagnosis, prevention and treatments. Finally, we highlight five major challenges often associated with interpreting enhancer variants, and discuss recent technical advances that may help to surmount these challenges.
Collapse
Affiliation(s)
- Olivia Corradin
- />Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH 44122 USA
| | - Peter C Scacheri
- />Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH 44122 USA
- />Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA
| |
Collapse
|
244
|
Ryan NM, Morris SW, Porteous DJ, Taylor MS, Evans KL. SuRFing the genomics wave: an R package for prioritising SNPs by functionality. Genome Med 2014; 6:79. [PMID: 25400697 PMCID: PMC4224693 DOI: 10.1186/s13073-014-0079-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 09/26/2014] [Indexed: 12/16/2022] Open
Abstract
Identifying functional non-coding variants is one of the greatest unmet challenges in genetics. To help address this, we introduce an R package, SuRFR, which integrates functional annotation and prior biological knowledge to prioritise candidate functional variants. SuRFR is publicly available, modular, flexible, fast, and simple to use. We demonstrate that SuRFR performs with high sensitivity and specificity and provide a widely applicable and scalable benchmarking dataset for model training and validation. Website: http://www.cgem.ed.ac.uk/resources/
Collapse
Affiliation(s)
- Niamh M Ryan
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - Stewart W Morris
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - David J Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK ; Centre for Cognitive Ageing and Cognitive Epidemiology, The University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ UK
| | - Martin S Taylor
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK
| | - Kathryn L Evans
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU UK ; Centre for Cognitive Ageing and Cognitive Epidemiology, The University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ UK
| |
Collapse
|
245
|
Li MJ, Wang J. Current trend of annotating single nucleotide variation in humans--A case study on SNVrap. Methods 2014; 79-80:32-40. [PMID: 25308971 DOI: 10.1016/j.ymeth.2014.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Revised: 09/25/2014] [Accepted: 10/02/2014] [Indexed: 12/16/2022] Open
Abstract
As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap.
Collapse
Affiliation(s)
- Mulin Jun Li
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China; Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China; Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Junwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China; Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China; Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China.
| |
Collapse
|
246
|
Ho ED, Cao Q, Lee SD, Yip KY. VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants. BMC Genomics 2014; 15:886. [PMID: 25306238 PMCID: PMC4210471 DOI: 10.1186/1471-2164-15-886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/03/2014] [Indexed: 12/29/2022] Open
Abstract
Background High-throughput experimental methods have fostered the systematic detection of millions of genetic variants from any human genome. To help explore the potential biological implications of these genetic variants, software tools have been previously developed for integrating various types of information about these genomic regions from multiple data sources. Most of these tools were designed either for studying a small number of variants at a time, or for local execution on powerful machines. Results To make exploration of whole lists of genetic variants simple and accessible, we have developed a new Web-based system called VAS (Variant Annotation System, available at
https://yiplab.cse.cuhk.edu.hk/vas/). It provides a large variety of information useful for studying both coding and non-coding variants, including whole-genome transcription factor binding, open chromatin and transcription data from the ENCODE consortium. By means of data compression, millions of variants can be uploaded from a client machine to the server in less than 50 megabytes of data. On the server side, our customized data integration algorithms can efficiently link millions of variants with tens of whole-genome datasets. These two enabling technologies make VAS a practical tool for annotating genetic variants from large genomic studies. We demonstrate the use of VAS in annotating genetic variants obtained from a migraine meta-analysis study and multiple data sets from the Personal Genomes Project. We also compare the running time of annotating 6.4 million SNPs of the CEU trio by VAS and another tool, showing that VAS is efficient in handling new variant lists without requiring any pre-computations. Conclusions VAS is specially designed to handle annotation tasks with long lists of genetic variants and large numbers of annotating features efficiently. It is complementary to other existing tools with more specific aims such as evaluating the potential impacts of genetic variants in terms of disease risk. We recommend using VAS for a quick first-pass identification of potentially interesting genetic variants, to minimize the time required for other more in-depth downstream analyses.
Collapse
Affiliation(s)
| | | | | | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| |
Collapse
|
247
|
Sadee W, Hartmann K, Seweryn M, Pietrzak M, Handelman SK, Rempala GA. Missing heritability of common diseases and treatments outside the protein-coding exome. Hum Genet 2014; 133:1199-1215. [PMID: 25107510 PMCID: PMC4169001 DOI: 10.1007/s00439-014-1476-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 07/23/2014] [Indexed: 02/07/2023]
Abstract
Genetic factors strongly influence risk of common human diseases and treatment outcomes but the causative variants remain largely unknown; this gap has been called the 'missing heritability'. We propose several hypotheses that in combination have the potential to narrow the gap. First, given a multi-stage path from wellness to disease, we propose that common variants under positive evolutionary selection represent normal variation and gate the transition between wellness and an 'off-well' state, revealing adaptations to changing environmental conditions. In contrast, genome-wide association studies (GWAS) focus on deleterious variants conveying disease risk, accelerating the path from off-well to illness and finally specific diseases, while common 'normal' variants remain hidden in the noise. Second, epistasis (dynamic gene-gene interactions) likely assumes a central role in adaptations and evolution; yet, GWAS analyses currently are poorly designed to reveal epistasis. As gene regulation is germane to adaptation, we propose that epistasis among common normal regulatory variants, or between common variants and less frequent deleterious variants, can have strong protective or deleterious phenotypic effects. These gene-gene interactions can be highly sensitive to environmental stimuli and could account for large differences in drug response between individuals. Residing largely outside the protein-coding exome, common regulatory variants affect either transcription of coding and non-coding RNAs (regulatory SNPs, or rSNPs) or RNA functions and processing (structural RNA SNPs, or srSNPs). Third, with the vast majority of causative variants yet to be discovered, GWAS rely on surrogate markers, a confounding factor aggravated by the presence of more than one causative variant per gene and by epistasis. We propose that the confluence of these factors may be responsible to large extent for the observed heritability gap.
Collapse
Affiliation(s)
- Wolfgang Sadee
- Department of Pharmacology, Center for Pharmacogenomics, College of Medicine, The Ohio State University Wexner Medical Center, 5184A Graves Hall, 333 West 10th Avenue, Columbus, OH, 43210, USA,
| | | | | | | | | | | |
Collapse
|
248
|
Genome-wide analysis of noncoding regulatory mutations in cancer. Nat Genet 2014; 46:1160-5. [PMID: 25261935 PMCID: PMC4217527 DOI: 10.1038/ng.3101] [Citation(s) in RCA: 384] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 09/03/2014] [Indexed: 01/05/2023]
Abstract
Cancer primarily develops due to somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here, we analyze complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify non-coding regions that are recurrently mutated in cancer. We utilize novel frequency and sequence-based approaches to comprehensively scan the genome for non-coding mutations with potential regulatory impact. We identified recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74, and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and associated with reduced gene expression and poor patient prognosis. The non-protein-coding cancer genome remains widely unexplored and our findings represent a step towards targeting the entire genome for clinical purposes.
Collapse
|
249
|
Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nat Rev Genet 2014; 15:749-63. [DOI: 10.1038/nrg3803] [Citation(s) in RCA: 512] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
250
|
Blumberg A, Sri Sailaja B, Kundaje A, Levin L, Dadon S, Shmorak S, Shaulian E, Meshorer E, Mishmar D. Transcription factors bind negatively selected sites within human mtDNA genes. Genome Biol Evol 2014; 6:2634-46. [PMID: 25245407 PMCID: PMC4224337 DOI: 10.1093/gbe/evu210] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Transcription of mitochondrial DNA (mtDNA)-encoded genes is thought to be regulated by a handful of dedicated transcription factors (TFs), suggesting that mtDNA genes are separately regulated from the nucleus. However, several TFs, with known nuclear activities, were found to bind mtDNA and regulate mitochondrial transcription. Additionally, mtDNA transcriptional regulatory elements, which were proved important in vitro, were harbored by a deletion that normally segregated among healthy individuals. Hence, mtDNA transcriptional regulation is more complex than once thought. Here, by analyzing ENCODE chromatin immunoprecipitation sequencing (ChIP-seq) data, we identified strong binding sites of three bona fide nuclear TFs (c-Jun, Jun-D, and CEBPb) within human mtDNA protein-coding genes. We validated the binding of two TFs by ChIP-quantitative polymerase chain reaction (c-Jun and Jun-D) and showed their mitochondrial localization by electron microscopy and subcellular fractionation. As a step toward investigating the functionality of these TF-binding sites (TFBS), we assessed signatures of selection. By analyzing 9,868 human mtDNA sequences encompassing all major global populations, we recorded genetic variants in tips and nodes of mtDNA phylogeny within the TFBS. We next calculated the effects of variants on binding motif prediction scores. Finally, the mtDNA variation pattern in predicted TFBS, occurring within ChIP-seq negative-binding sites, was compared with ChIP-seq positive-TFBS (CPR). Motifs within CPRs of c-Jun, Jun-D, and CEBPb harbored either only tip variants or their nodal variants retained high motif prediction scores. This reflects negative selection within mtDNA CPRs, thus supporting their functionality. Hence, human mtDNA-coding sequences may have dual roles, namely coding for genes yet possibly also possessing regulatory potential.
Collapse
Affiliation(s)
- Amit Blumberg
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Badi Sri Sailaja
- Department of Genetics, The Institute of Life Sciences, and The Edmond Lily Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Israel
| | - Anshul Kundaje
- Department of Genetics, Stanford University Department of Computer Science, Stanford University
| | - Liron Levin
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Sara Dadon
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Shimrit Shmorak
- Department of Biochemistry and Molecular Biology, IMRIC, The Hebrew University Medical School, Ein Karem, Jerusalem, Israel
| | - Eitan Shaulian
- Department of Biochemistry and Molecular Biology, IMRIC, The Hebrew University Medical School, Ein Karem, Jerusalem, Israel
| | - Eran Meshorer
- Department of Genetics, The Institute of Life Sciences, and The Edmond Lily Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Israel
| | - Dan Mishmar
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| |
Collapse
|