1
|
Ray-Jones H, Sung CK, Chan LT, Haglund A, Artemov P, Della Rosa M, Ruje L, Burden F, Kreuzhuber R, Litovskikh A, Weyenbergh E, Brusselaers Z, Tan VXH, Frontini M, Wallace C, Malysheva V, Bottolo L, Vigorito E, Spivakov M. Genetic coupling of enhancer activity and connectivity in gene expression control. Nat Commun 2025; 16:970. [PMID: 39870618 PMCID: PMC11772589 DOI: 10.1038/s41467-025-55900-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 01/03/2025] [Indexed: 01/29/2025] Open
Abstract
Gene enhancers often form long-range contacts with promoters, but it remains unclear if the activity of enhancers and their chromosomal contacts are mediated by the same DNA sequences and recruited factors. Here, we study the effects of expression quantitative trait loci (eQTLs) on enhancer activity and promoter contacts in primary monocytes isolated from 34 male individuals. Using eQTL-Capture Hi-C and a Bayesian approach considering both intra- and inter-individual variation, we initially detect 19 eQTLs associated with enhancer-eGene promoter contacts, most of which also associate with enhancer accessibility and activity. Capitalising on these shared effects, we devise a multi-modality Bayesian strategy, identifying 629 "trimodal QTLs" jointly associated with enhancer accessibility, eGene promoter contact, and gene expression. Causal mediation analysis and CRISPR interference reveal causal relationships between these three modalities. Many detected QTLs overlap disease susceptibility loci and influence the predicted binding of myeloid transcription factors, including SPI1, GABPB and STAT3. Additionally, a variant associated with PCK2 promoter contact directly disrupts a CTCF binding motif and impacts promoter insulation from downstream enhancers. Jointly, our findings suggest an inherent genetic coupling of enhancer activity and connectivity in gene expression control relevant to human disease and highlight the regulatory role of genetically determined chromatin boundaries.
Collapse
Affiliation(s)
- Helen Ray-Jones
- MRC Laboratory of Medical Sciences, London, UK.
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK.
- Computational Neurobiology, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium.
- Computational Neurobiology, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Chak Kei Sung
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
- LKS Faculty of Medicine, the University of Hong Kong, Hong Kong, Hong Kong
| | - Lai Ting Chan
- Computational Neurobiology, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Computational Neurobiology, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Alexander Haglund
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Pavel Artemov
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
| | - Monica Della Rosa
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
- Cyted, Cambridge, UK
| | - Luminita Ruje
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
| | - Frances Burden
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge, UK
- University of Kent, Canterbury, UK
| | - Roman Kreuzhuber
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge, UK
- EMBL-EBI, Wellcome Genome Campus, Cambridge, UK
- Swiss Federal Administration, Bern, Switzerland
| | - Anna Litovskikh
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
- Institute of Computational Biology, Helmholtz Zentrum München and Ludwig Maximilians University Munich, Faculty of Medicine, Munich, Germany
| | - Eline Weyenbergh
- Computational Neurobiology, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Computational Neurobiology, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- University Hospital Antwerp (UZA), Antwerp, Belgium
| | - Zoï Brusselaers
- Computational Neurobiology, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Computational Neurobiology, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- University of Antwerp, Antwerp, Belgium
| | - Vanessa Xue Hui Tan
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
- Hummingbird Bioscience, Singapore, Singapore
| | - Mattia Frontini
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Cambridge, UK
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter Medical School, Exeter, UK
| | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Valeriya Malysheva
- MRC Laboratory of Medical Sciences, London, UK
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK
- Computational Neurobiology, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Computational Neurobiology, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Leonardo Bottolo
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
- The Alan Turing Institute, London, UK.
| | - Elena Vigorito
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Mikhail Spivakov
- MRC Laboratory of Medical Sciences, London, UK.
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK.
| |
Collapse
|
2
|
Baniulyte G, Hicks SM, Sammons MA. p53motifDB: integration of genomic information and tumor suppressor p53 binding motifs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614594. [PMID: 39386591 PMCID: PMC11463528 DOI: 10.1101/2024.09.24.614594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
The tumor suppressor gene TP53 encodes the DNA binding transcription factor p53 and is one of the most commonly mutated genes in human cancer. Tumor suppressor activity requires binding of p53 to its DNA response elements and subsequent transcriptional activation of a diverse set of target genes. Despite decades of close study, the logic underlying p53 interactions with its numerous potential genomic binding sites and target genes is not yet fully understood. Here, we present a database of DNA and chromatin-based information focused on putative p53 binding sites in the human genome to allow users to generate and test new hypotheses related to p53 activity in the genome. Users can query genomic locations based on experimentally observed p53 binding, regulatory element activity, genetic variation, evolutionary conservation, chromatin modification state, and chromatin structure. We present multiple use cases demonstrating the utility of this database for generating novel biological hypotheses, such as chromatin-based determinants of p53 binding and potential cell type-specific p53 activity. All database information is also available as a precompiled sqlite database for use in local analysis or as a Shiny web application.
Collapse
Affiliation(s)
- Gabriele Baniulyte
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, Albany, NY 12222
| | - Sawyer M Hicks
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, Albany, NY 12222
| | - Morgan A Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, Albany, NY 12222
| |
Collapse
|
3
|
Dang TC, Fields L, Li L. MotifQuest: An Automated Pipeline for Motif Database Creation to Improve Peptidomics Database Searching Programs. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1902-1912. [PMID: 39058243 PMCID: PMC11550313 DOI: 10.1021/jasms.4c00192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2024]
Abstract
Endogenous peptides are an abundant and versatile class of biomolecules with vital roles pertinent to the functionality of the nervous, endocrine, and immune systems and others. Mass spectrometry stands as a premier technique for identifying endogenous peptides, yet the field still faces challenges due to the lack of optimized computational resources for reliable raw mass spectra analysis and interpretation. Current database searching programs can exhibit discrepancies due to the unique properties of endogenous peptides, which typically require specialized search considerations. Herein, we present a high throughput, novel scoring algorithm for the extraction and ranking of conserved amino acid sequence motifs within any endogenous peptide database. Motifs are conserved patterns across organisms, representing sequence moieties crucial for biological functions, including maintenance of homeostasis. MotifQuest, our novel motif database generation algorithm, is designed to work in partnership with EndoGenius, a program optimized for database searching of endogenous peptides and that is powered by a motif database to capitalize on biological context to produce identifications. MotifQuest aims to quickly develop motif databases without any prior knowledge, a laborious task not possible with traditional sequence alignment resources. In this work we illustrate the utility of MotifQuest to expand EndoGenius' identification utility to other endogenous peptides by showcasing its ability to identify antimicrobial peptides. Additionally, we discuss the potential utility of MotifQuest to parse out motifs from a FASTA database file that can be further validated as new peptide drug candidates.
Collapse
Affiliation(s)
- Tina C. Dang
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705
| | - Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706
| | - Lingjun Li
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706
| |
Collapse
|
4
|
Du A, Guo Z, Chen A, Xu L, Sun D, Han B. PC Gene Affects Milk Production Traits in Dairy Cattle. Genes (Basel) 2024; 15:708. [PMID: 38927644 PMCID: PMC11202589 DOI: 10.3390/genes15060708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/20/2024] [Accepted: 05/27/2024] [Indexed: 06/28/2024] Open
Abstract
In previous work, we found that PC was differentially expressed in cows at different lactation stages. Thus, we deemed that PC may be a candidate gene affecting milk production traits in dairy cattle. In this study, we found the polymorphisms of PC by resequencing and verified their genetic associations with milk production traits by using an animal model in a cattle population. In total, we detected six single-nucleotide polymorphisms (SNPs) in PC. The single marker association analysis showed that all SNPs were significantly associated with the five milk production traits (p < 0.05). Additionally, we predicted that allele G of 29:g.44965658 in the 5' regulatory region created binding sites for TF GATA1 and verified that this allele inhibited the transcriptional activity of PC by the dual-luciferase reporter assay. In conclusion, we proved that PC had a prominent genetic effect on milk production traits, and six SNPs with prominent genetic effects could be used as markers for genomic selection (GS) in dairy cattle, which is beneficial for accelerating the improvement in milk yield and quality in Chinese Holstein cows.
Collapse
Affiliation(s)
| | | | | | | | | | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China; (A.D.); (Z.G.); (A.C.); (L.X.); (D.S.)
| |
Collapse
|
5
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
6
|
Einarsson H, Salvatore M, Vaagensø C, Alcaraz N, Bornholdt J, Rennie S, Andersson R. Promoter sequence and architecture determine expression variability and confer robustness to genetic variants. eLife 2022; 11:e80943. [PMID: 36377861 PMCID: PMC9844987 DOI: 10.7554/elife.80943] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
Genetic and environmental exposures cause variability in gene expression. Although most genes are affected in a population, their effect sizes vary greatly, indicating the existence of regulatory mechanisms that could amplify or attenuate expression variability. Here, we investigate the relationship between the sequence and transcription start site architectures of promoters and their expression variability across human individuals. We find that expression variability can be largely explained by a promoter's DNA sequence and its binding sites for specific transcription factors. We show that promoter expression variability reflects the biological process of a gene, demonstrating a selective trade-off between stability for metabolic genes and plasticity for responsive genes and those involved in signaling. Promoters with a rigid transcription start site architecture are more prone to have variable expression and to be associated with genetic variants with large effect sizes, while a flexible usage of transcription start sites within a promoter attenuates expression variability and limits genotypic effects. Our work provides insights into the variable nature of responsive genes and reveals a novel mechanism for supplying transcriptional and mutational robustness to essential genes through multiple transcription start site regions within a promoter.
Collapse
Affiliation(s)
| | - Marco Salvatore
- Department of Biology, University of CopenhagenCopenhagenDenmark
| | | | - Nicolas Alcaraz
- Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Jette Bornholdt
- Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Sarah Rennie
- Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Robin Andersson
- Department of Biology, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
7
|
Du A, Zhao F, Liu Y, Xu L, Chen K, Sun D, Han B. Genetic polymorphisms of PKLR gene and their associations with milk production traits in Chinese Holstein cows. Front Genet 2022; 13:1002706. [PMID: 36118870 PMCID: PMC9479125 DOI: 10.3389/fgene.2022.1002706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 08/12/2022] [Indexed: 11/13/2022] Open
Abstract
Our previous work had confirmed that pyruvate kinase L/R (PKLR) gene was expressed differently in different lactation periods of dairy cattle, and participated in lipid metabolism through insulin, PI3K-Akt, MAPK, AMPK, mTOR, and PPAR signaling pathways, suggesting that PKLR is a candidate gene to affect milk production traits in dairy cattle. Here, we verified whether this gene has significant genetic association with milk yield and composition traits in a Chinese Holstein cow population. In total, we identified 21 single nucleotide polymorphisms (SNPs) by resequencing the entire coding region and partial flanking region of PKLR gene, in which, two SNPs were located in 5′ promoter region, two in 5′ untranslated region (UTR), three in introns, five in exons, six in 3′ UTR and three in 3′ flanking region. The single marker association analysis displayed that all SNPs were significantly associated with milk yield, fat and protein yields or protein percentage (p ≤ 0.0497). The haplotype block containing all the SNPs, predicted by Haploview, had a significant association with fat yield and protein percentage (p ≤ 0.0145). Further, four SNPs in 5′ regulatory region and eight SNPs in UTR and exon regions were predicted to change the transcription factor binding sites (TFBSs) and mRNA secondary structure, respectively, thus affecting the expression of PKLR, leading to changes in milk production phenotypes, suggesting that these SNPs might be the potential functional mutations for milk production traits in dairy cattle. In conclusion, we demonstrated that PKLR had significant genetic effects on milk production traits, and the SNPs with significant genetic effects could be used as candidate genetic markers for genomic selection (GS) in dairy cattle.
Collapse
Affiliation(s)
- Aixia Du
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | - Yanan Liu
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingna Xu
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Kewei Chen
- Yantai Institute, China Agricultural University, Yantai, China
| | - Dongxiao Sun
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Bo Han
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Bo Han, /
| |
Collapse
|
8
|
Krieger G, Lupo O, Wittkopp P, Barkai N. Evolution of transcription factor binding through sequence variations and turnover of binding sites. Genome Res 2022; 32:1099-1111. [PMID: 35618416 PMCID: PMC9248875 DOI: 10.1101/gr.276715.122] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/20/2022] [Indexed: 01/08/2023]
Abstract
Variations in noncoding regulatory sequences play a central role in evolution. Interpreting such variations, however, remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis-regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, in which two related genomes are present within the same nucleus. TFs localize preferentially to sites containing their known consensus motifs but occupy only a small fraction of the motif-containing sites available within the genomes. Differential binding of TFs to the orthologous alleles was well explained by variations that alter motif sequence, whereas differences in chromatin accessibility between alleles were of little apparent effect. Motif variations that abolished binding when present in only one allele were still bound when present in both alleles, suggesting evolutionary compensation, with a potential role for sequence conservation at the motif's vicinity. At the level of the full promoter, we identify cases of binding-site turnover, in which binding sites are reciprocally gained and lost, yet most interspecific differences remained uncompensated. Our results show the flexibility of TFs to bind imprecise motifs and the fast evolution of TF binding sites between related species.
Collapse
Affiliation(s)
- Gat Krieger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Offir Lupo
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Patricia Wittkopp
- Department of Ecology and Evolutionary Biology, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
9
|
Decker KT, Gao Y, Rychel K, Al Bulushi T, Chauhan S, Kim D, Cho BK, Palsson B. proChIPdb: a chromatin immunoprecipitation database for prokaryotic organisms. Nucleic Acids Res 2022; 50:D1077-D1084. [PMID: 34791440 PMCID: PMC8728212 DOI: 10.1093/nar/gkab1043] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 10/05/2021] [Accepted: 10/14/2021] [Indexed: 12/03/2022] Open
Abstract
The transcriptional regulatory network in prokaryotes controls global gene expression mostly through transcription factors (TFs), which are DNA-binding proteins. Chromatin immunoprecipitation (ChIP) with DNA sequencing methods can identify TF binding sites across the genome, providing a bottom-up, mechanistic understanding of how gene expression is regulated. ChIP provides indispensable evidence toward the goal of acquiring a comprehensive understanding of cellular adaptation and regulation, including condition-specificity. ChIP-derived data's importance and labor-intensiveness motivate its broad dissemination and reuse, which is currently an unmet need in the prokaryotic domain. To fill this gap, we present proChIPdb (prochipdb.org), an information-rich, interactive web database. This website collects public ChIP-seq/-exo data across several prokaryotes and presents them in dashboards that include curated binding sites, nucleotide-resolution genome viewers, and summary plots such as motif enrichment sequence logos. Users can search for TFs of interest or their target genes, download all data, dashboards, and visuals, and follow external links to understand regulons through biological databases and the literature. This initial release of proChIPdb covers diverse organisms, including most major TFs of Escherichia coli, and can be expanded to support regulon discovery across the prokaryotic domain.
Collapse
Affiliation(s)
- Katherine T Decker
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Tahani Al Bulushi
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Siddharth M Chauhan
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
| | - Byung-Kwan Cho
- Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon34141, Republic of Korea
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
10
|
Ray-Jones H, Spivakov M. Transcriptional enhancers and their communication with gene promoters. Cell Mol Life Sci 2021; 78:6453-6485. [PMID: 34414474 PMCID: PMC8558291 DOI: 10.1007/s00018-021-03903-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 07/08/2021] [Accepted: 07/19/2021] [Indexed: 12/13/2022]
Abstract
Transcriptional enhancers play a key role in the initiation and maintenance of gene expression programmes, particularly in metazoa. How these elements control their target genes in the right place and time is one of the most pertinent questions in functional genomics, with wide implications for most areas of biology. Here, we synthesise classic and recent evidence on the regulatory logic of enhancers, including the principles of enhancer organisation, factors that facilitate and delimit enhancer-promoter communication, and the joint effects of multiple enhancers. We show how modern approaches building on classic insights have begun to unravel the complexity of enhancer-promoter relationships, paving the way towards a quantitative understanding of gene control.
Collapse
Affiliation(s)
- Helen Ray-Jones
- MRC London Institute of Medical Sciences, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK
| | - Mikhail Spivakov
- MRC London Institute of Medical Sciences, London, W12 0NN, UK.
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK.
| |
Collapse
|
11
|
Findley AS, Zhang X, Boye C, Lin YL, Kalita CA, Barreiro L, Lohmueller KE, Pique-Regi R, Luca F. A signature of Neanderthal introgression on molecular mechanisms of environmental responses. PLoS Genet 2021; 17:e1009493. [PMID: 34570765 PMCID: PMC8509894 DOI: 10.1371/journal.pgen.1009493] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/12/2021] [Accepted: 08/18/2021] [Indexed: 12/17/2022] Open
Abstract
Ancient human migrations led to the settlement of population groups in varied environmental contexts worldwide. The extent to which adaptation to local environments has shaped human genetic diversity is a longstanding question in human evolution. Recent studies have suggested that introgression of archaic alleles in the genome of modern humans may have contributed to adaptation to environmental pressures such as pathogen exposure. Functional genomic studies have demonstrated that variation in gene expression across individuals and in response to environmental perturbations is a main mechanism underlying complex trait variation. We considered gene expression response to in vitro treatments as a molecular phenotype to identify genes and regulatory variants that may have played an important role in adaptations to local environments. We investigated if Neanderthal introgression in the human genome may contribute to the transcriptional response to environmental perturbations. To this end we used eQTLs for genes differentially expressed in a panel of 52 cellular environments, resulting from 5 cell types and 26 treatments, including hormones, vitamins, drugs, and environmental contaminants. We found that SNPs with introgressed Neanderthal alleles (N-SNPs) disrupt binding of transcription factors important for environmental responses, including ionizing radiation and hypoxia, and for glucose metabolism. We identified an enrichment for N-SNPs among eQTLs for genes differentially expressed in response to 8 treatments, including glucocorticoids, caffeine, and vitamin D. Using Massively Parallel Reporter Assays (MPRA) data, we validated the regulatory function of 21 introgressed Neanderthal variants in the human genome, corresponding to 8 eQTLs regulating 15 genes that respond to environmental perturbations. These findings expand the set of environments where archaic introgression may have contributed to adaptations to local environments in modern humans and provide experimental validation for the regulatory function of introgressed variants.
Collapse
Affiliation(s)
- Anthony S. Findley
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Xinjun Zhang
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
| | - Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Yen Lung Lin
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Cynthia A. Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Luis Barreiro
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California, United States of America
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
12
|
Shen Z, Hoeksema MA, Ouyang Z, Benner C, Glass CK. MAGGIE: leveraging genetic variation to identify DNA sequence motifs mediating transcription factor binding and function. Bioinformatics 2021; 36:i84-i92. [PMID: 32657363 PMCID: PMC7355228 DOI: 10.1093/bioinformatics/btaa476] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Genetic variation in regulatory elements can alter transcription factor (TF) binding by mutating a TF binding motif, which in turn may affect the activity of the regulatory elements. However, it is unclear which motifs are prone to impact transcriptional regulation if mutated. Current motif analysis tools either prioritize TFs based on motif enrichment without linking to a function or are limited in their applications due to the assumption of linearity between motifs and their functional effects. RESULTS We present MAGGIE (Motif Alteration Genome-wide to Globally Investigate Elements), a novel method for identifying motifs mediating TF binding and function. By leveraging measurements from diverse genotypes, MAGGIE uses a statistical approach to link mutations of a motif to changes of an epigenomic feature without assuming a linear relationship. We benchmark MAGGIE across various applications using both simulated and biological datasets and demonstrate its improvement in sensitivity and specificity compared with the state-of-the-art motif analysis approaches. We use MAGGIE to gain novel insights into the divergent functions of distinct NF-κB factors in pro-inflammatory macrophages, revealing the association of p65-p50 co-binding with transcriptional activation and the association of p50 binding lacking p65 with transcriptional repression. AVAILABILITY AND IMPLEMENTATION The Python package for MAGGIE is freely available at https://github.com/zeyang-shen/maggie. The accession number for the NF-κB ChIP-seq data generated for this study is Gene Expression Omnibus: GSE144070. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zeyang Shen
- Department of Cellular and Molecular Medicine, School of Medicine.,Department of Bioengineering, Jacobs School of Engineering
| | | | - Zhengyu Ouyang
- Department of Cellular and Molecular Medicine, School of Medicine
| | - Christopher Benner
- Department of Medicine, School of Medicine, University of California, San Diego, CA 92093, USA
| | - Christopher K Glass
- Department of Cellular and Molecular Medicine, School of Medicine.,Department of Medicine, School of Medicine, University of California, San Diego, CA 92093, USA
| |
Collapse
|
13
|
Floc'hlay S, Wong ES, Zhao B, Viales RR, Thomas-Chollier M, Thieffry D, Garfield DA, Furlong EEM. Cis-acting variation is common across regulatory layers but is often buffered during embryonic development. Genome Res 2021; 31:211-224. [PMID: 33310749 PMCID: PMC7849415 DOI: 10.1101/gr.266338.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/09/2020] [Indexed: 12/14/2022]
Abstract
Precise patterns of gene expression are driven by interactions between transcription factors, regulatory DNA sequences, and chromatin. How DNA mutations affecting any one of these regulatory "layers" are buffered or propagated to gene expression remains unclear. To address this, we quantified allele-specific changes in chromatin accessibility, histone modifications, and gene expression in F1 embryos generated from eight Drosophila crosses at three embryonic stages, yielding a comprehensive data set of 240 samples spanning multiple regulatory layers. Genetic variation (allelic imbalance) impacts gene expression more frequently than chromatin features, with metabolic and environmental response genes being most often affected. Allelic imbalance in cis-regulatory elements (enhancers) is common and highly heritable, yet its functional impact does not generally propagate to gene expression. When it does, genetic variation impacts RNA levels through two alternative mechanisms involving either H3K4me3 or chromatin accessibility and H3K27ac. Changes in RNA are more predictive of variation in H3K4me3 than vice versa, suggesting a role for H3K4me3 downstream from transcription. The impact of a substantial proportion of genetic variation is consistent across embryonic stages, with 50% of allelic imbalanced features at one stage being also imbalanced at subsequent developmental stages. Crucially, buffering, as well as the magnitude and evolutionary impact of genetic variants, is influenced by regulatory complexity (i.e., number of enhancers regulating a gene), with transcription factors being most robust to cis-acting, but most influenced by trans-acting, variation.
Collapse
Affiliation(s)
- Swann Floc'hlay
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Emily S Wong
- Molecular, Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, New South Wales 2052, Australia
| | - Bingqing Zhao
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Rebecca R Viales
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Morgane Thomas-Chollier
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Institut Universitaire de France (IUF), 75005 Paris, France
| | - Denis Thieffry
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| |
Collapse
|
14
|
Mitchelmore J, Grinberg NF, Wallace C, Spivakov M. Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters. Nucleic Acids Res 2020; 48:2866-2879. [PMID: 32112106 PMCID: PMC7102942 DOI: 10.1093/nar/gkaa123] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 02/14/2020] [Accepted: 02/17/2020] [Indexed: 02/06/2023] Open
Abstract
Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of 'epromoters': dual-action CRMs with promoter and distal enhancer activity.
Collapse
Affiliation(s)
- Joanna Mitchelmore
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Nastasiya F Grinberg
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, Du Cane Road, London W12 0NN, UK
| |
Collapse
|
15
|
Peña-Chilet M, Esteban-Medina M, Falco MM, Rian K, Hidalgo MR, Loucera C, Dopazo J. Using mechanistic models for the clinical interpretation of complex genomic variation. Sci Rep 2019; 9:18937. [PMID: 31831811 PMCID: PMC6908734 DOI: 10.1038/s41598-019-55454-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 11/28/2019] [Indexed: 02/07/2023] Open
Abstract
The sustained generation of genomic data in the last decade has increased the knowledge on the causal mutations of a large number of diseases, especially for highly penetrant Mendelian diseases, typically caused by a unique or a few genes. However, the discovery of causal genes in complex diseases has been far less successful. Many complex diseases are actually a consequence of the failure of complex biological modules, composed by interrelated proteins, which can happen in many different ways, which conferring a multigenic nature to the condition that can hardly be attributed to one or a few genes. We present a mechanistic model, Hipathia, implemented in a web server that allows estimating the effect that mutations, or changes in the expression of genes, have over the whole system of human signaling and the corresponding functional consequences. We show several use cases where we demonstrate how different the ultimate impact of mutations with similar loss-of-function potential can be and how the potential pathological role of a damaged gene can be inferred within the context of a signaling network. The use of systems biology-based approaches, such as mechanistic models, allows estimating the potential impact of loss-of-function mutations occurring in proteins that are part of complex biological interaction networks, such as signaling pathways. This holistic approach provides an elegant alternative to gene-centric approaches that can open new avenues in the interpretation of the genomic variability in complex diseases.
Collapse
Affiliation(s)
- María Peña-Chilet
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain
- Bioinformatics in RareDiseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain
| | - Marina Esteban-Medina
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain
| | - Matias M Falco
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain
- Bioinformatics in RareDiseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain
| | - Kinza Rian
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain
| | - Marta R Hidalgo
- Bioinformatics and Biostatistics Unit, Centro de Investigación Príncipe Felipe (CIPF), 46012, Valencia, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocío, 41013, Sevilla, Spain.
- Bioinformatics in RareDiseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain.
- INB-ELIXIR-es, FPS, Hospital Virgen del Rocío, Sevilla, 42013, Spain.
| |
Collapse
|
16
|
Serpent/dGATAb regulates Laminin B1 and Laminin B2 expression during Drosophila embryogenesis. Sci Rep 2019; 9:15910. [PMID: 31685844 PMCID: PMC6828711 DOI: 10.1038/s41598-019-52210-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 10/15/2019] [Indexed: 12/15/2022] Open
Abstract
Transcriptional regulation of Laminin expression during embryogenesis is a key step required for proper ECM assembly. We show, that in Drosophila the Laminin B1 and Laminin B2 genes share expression patterns in mesodermal cells as well as in endodermal and ectodermal gut primordia, yolk and amnioserosa. In the absence of the GATA transcription factor Serpent, the spatial extend of Laminin reporter gene expression was strongly limited, indicating that Laminin expression in many tissues depends on Serpent activity. We demonstrate a direct binding of Serpent to the intronic enhancers of Laminin B1 and Laminin B2. In addition, ectopically expressed Serpent activated enhancer elements of Laminin B1 and Laminin B2. Our results reveal Serpent as an important regulator of Laminin expression across tissues.
Collapse
|
17
|
Gong Y, Cheng X, Tian J, Li J, Zhu Y, Yang Y, Zou D, Peng X, Luo J, Zhao L, Mei S, Wang X, Yang N, Ke J, Gong J, Chang J, Wang Y, Zhong R. Integrative analysis identifies genetic variant modulating MICA expression and altering susceptibility to persistent HBV infection. Liver Int 2019; 39:1927-1936. [PMID: 31033131 DOI: 10.1111/liv.14127] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 04/08/2019] [Accepted: 04/18/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND & AIMS Genome-wide association studies have identified multiple genetic signals associated with the risk of persistent hepatitis B virus (HBV) infection and HBV-related hepatocellular carcinoma. However, the majority of the associated variants may only be markers of functional variants and the underlying biological mechanisms remain elusive. We hypothesized that the functional variants with modulating transcription factor (TF) binding affinity in genome-wide association studies-identified loci may influence the risk of persistent HBV infection in Chinese people. METHODS A systematic bioinformatics approach was implemented to prioritize potential functional variants that may influence TF binding. A two-stage case-control study, including 1595 HBV-persistent carriers and 1590 subjects with HBV natural clearance, was conducted to examine the associations between candidate variants and susceptibility to persistent HBV infection. Biological assays were carried out to elucidate the underlying mechanism of the associated genetic variants. RESULTS Twelve candidate variants were identified, and rs2523454 G > A increased the risk of persistent HBV infection (dominant model: ORcombined = 1.37, 95% CI = 1.19-1.58, P = 1.610 × 10-5 ). Functional assays indicated that the rs2523454 A allele significantly decreased transcriptional activity compared to the G allele by influencing TF-binding affinity. In addition, expression quantitative trait loci analyses revealed that the A allele was associated with the reduced expression of MICA (P < 0.01). CONCLUSIONS Our findings suggest that the germline G > A variation at rs2523454 may influence TF-DNA interaction, downregulate the expression of MICA and play an important role in the development of persistent HBV infection in the Chinese population.
Collapse
Affiliation(s)
- Yajie Gong
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiang Cheng
- Department of Hepatobiliary Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jianbo Tian
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jiaoyuan Li
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ying Zhu
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yang Yang
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Danyi Zou
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiating Peng
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jinzhuo Luo
- Department of Infectious Disease, Union Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lei Zhao
- Department of Infectious Disease, Union Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shufang Mei
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoyang Wang
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Nan Yang
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Juntao Ke
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jing Gong
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jiang Chang
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ying Wang
- Department of Virology, Wuhan Centers for Disease Prevention and Control, Wuhan, China
| | - Rong Zhong
- Department of Epidemiology and Biostatistics, MOE Key Laboratory of Environment & Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
18
|
Li S, Kvon EZ, Visel A, Pennacchio LA, Ovcharenko I. Stable enhancers are active in development, and fragile enhancers are associated with evolutionary adaptation. Genome Biol 2019; 20:140. [PMID: 31307522 PMCID: PMC6631995 DOI: 10.1186/s13059-019-1750-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 06/28/2019] [Indexed: 12/13/2022] Open
Abstract
Background Despite continual progress in the identification and characterization of trait- and disease-associated variants that disrupt transcription factor (TF)-DNA binding, little is known about the distribution of TF binding deactivating mutations (deMs) in enhancer sequences. Here, we focus on elucidating the mechanism underlying the different densities of deMs in human enhancers. Results We identify two classes of enhancers based on the density of nucleotides prone to deMs. Firstly, fragile enhancers with abundant deM nucleotides are associated with the immune system and regular cellular maintenance. Secondly, stable enhancers with only a few deM nucleotides are associated with the development and regulation of TFs and are evolutionarily conserved. These two classes of enhancers feature different regulatory programs: the binding sites of pioneer TFs of FOX family are specifically enriched in stable enhancers, while tissue-specific TFs are enriched in fragile enhancers. Moreover, stable enhancers are more tolerant of deMs due to their dominant employment of homotypic TF binding site (TFBS) clusters, as opposed to the larger-extent usage of heterotypic TFBS clusters in fragile enhancers. Notably, the sequence environment and chromatin context of the cognate motif, other than the motif itself, contribute more to the susceptibility to deMs of TF binding. Conclusions This dichotomy of enhancer activity is conserved across different tissues, has a specific footprint in epigenetic profiles, and argues for a bimodal evolution of gene regulatory programs in vertebrates. Specifically encoded stable enhancers are evolutionarily conserved and associated with development, while differently encoded fragile enhancers are associated with the adaptation of species. Electronic supplementary material The online version of this article (10.1186/s13059-019-1750-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shan Li
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Evgeny Z Kvon
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.,United States Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA.,School of Natural Sciences, University of California, Merced, CA, 95343, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.,United States Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA.,Comparative Biochemistry Program, University of California, Berkeley, CA, 94720, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
19
|
Wagih O, Galardini M, Busby BP, Memon D, Typas A, Beltrao P. A resource of variant effect predictions of single nucleotide variants in model organisms. Mol Syst Biol 2018; 14:e8430. [PMID: 30573687 PMCID: PMC6301329 DOI: 10.15252/msb.20188430] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 11/19/2018] [Accepted: 11/21/2018] [Indexed: 12/18/2022] Open
Abstract
The effect of single nucleotide variants (SNVs) in coding and noncoding regions is of great interest in genetics. Although many computational methods aim to elucidate the effects of SNVs on cellular mechanisms, it is not straightforward to comprehensively cover different molecular effects. To address this, we compiled and benchmarked sequence and structure-based variant effect predictors and we computed the impact of nearly all possible amino acid and nucleotide variants in the reference genomes of Homo sapiens, Saccharomyces cerevisiae and Escherichia coli Studied mechanisms include protein stability, interaction interfaces, post-translational modifications and transcription factor binding sites. We apply this resource to the study of natural and disease coding variants. We also show how variant effects can be aggregated to generate protein complex burden scores that uncover protein complex to phenotype associations based on a set of newly generated growth profiles of 93 sequenced S. cerevisiae strains in 43 conditions. This resource is available through mutfunc (www.mutfunc.com), a tool by which users can query precomputed predictions by providing amino acid or nucleotide-level variants.
Collapse
Affiliation(s)
- Omar Wagih
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Marco Galardini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Bede P Busby
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Danish Memon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Athanasios Typas
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
20
|
Devailly G, Joshi A. Insights into mammalian transcription control by systematic analysis of ChIP sequencing data. BMC Bioinformatics 2018; 19:409. [PMID: 30453943 PMCID: PMC6245581 DOI: 10.1186/s12859-018-2377-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Transcription regulation is a major controller of gene expression dynamics during development and disease, where transcription factors (TFs) modulate expression of genes through direct or indirect DNA interaction. ChIP sequencing has become the most widely used technique to get a genome wide view of TF occupancy in a cell type of interest, mainly due to established standard protocols and a rapid decrease in the cost of sequencing. The number of available ChIP sequencing data sets in public domain is therefore ever increasing, including data generated by individual labs together with consortia such as the ENCODE project. Results A total of 1735 ChIP-sequencing datasets in mouse and human cell types and tissues were used to perform bioinformatic analyses to unravel diverse features of transcription control. 1- We used the Heat*seq webtool to investigate global relations across the ChIP-seq samples. 2- We demonstrated that factors have a specific genomic location preferences that are, for most factors, conserved across species. 3- Promoter proximal binding of factors was more conserved across cell types while the distal binding sites are more cell type specific. 4- We identified combinations of factors preferentially acting together in a cellular context. 5- Finally, by integrating the data with disease-associated gene loci from GWAS studies, we highlight the value of this data to associate novel regulators to disease. Conclusion In summary, we demonstrate how ChIP sequencing data integration and analysis is powerful to get new insights into mammalian transcription control and demonstrate the utility of various bioinformatic tools to generate novel testable hypothesis using this public resource.
Collapse
Affiliation(s)
- Guillaume Devailly
- Division of Developmental Biology, the Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Anagha Joshi
- Division of Developmental Biology, the Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
21
|
An information theoretic treatment of sequence-to-expression modeling. PLoS Comput Biol 2018; 14:e1006459. [PMID: 30256780 PMCID: PMC6175532 DOI: 10.1371/journal.pcbi.1006459] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 10/08/2018] [Accepted: 08/24/2018] [Indexed: 11/23/2022] Open
Abstract
Studying a gene’s regulatory mechanisms is a tedious process that involves identification of candidate regulators by transcription factor (TF) knockout or over-expression experiments, delineation of enhancers by reporter assays, and demonstration of direct TF influence by site mutagenesis, among other approaches. Such experiments are often chosen based on the biologist’s intuition, from several testable hypotheses. We pursue the goal of making this process systematic by using ideas from information theory to reason about experiments in gene regulation, in the hope of ultimately enabling rigorous experiment design strategies. For this, we make use of a state-of-the-art mathematical model of gene expression, which provides a way to formalize our current knowledge of cis- as well as trans- regulatory mechanisms of a gene. Ambiguities in such knowledge can be expressed as uncertainties in the model, which we capture formally by building an ensemble of plausible models that fit the existing data and defining a probability distribution over the ensemble. We then characterize the impact of a new experiment on our understanding of the gene’s regulation based on how the ensemble of plausible models and its probability distribution changes when challenged with results from that experiment. This allows us to assess the ‘value’ of the experiment retroactively as the reduction in entropy of the distribution (information gain) resulting from the experiment’s results. We fully formalize this novel approach to reasoning about gene regulation experiments and use it to evaluate a variety of perturbation experiments on two developmental genes of D. melanogaster. We also provide objective and ‘biologist-friendly’ descriptions of the information gained from each such experiment. The rigorously defined information theoretic approaches presented here can be used in the future to formulate systematic strategies for experiment design pertaining to studies of gene regulatory mechanisms. In-depth studies of gene regulatory mechanisms employ a variety of experimental approaches such as identifying a gene’s enhancer(s) and testing its variants through reporter assays, followed by transcription factor mis-expression or knockouts, site mutagenesis, etc. The biologist is often faced with the challenging problem of selecting the ideal next experiment to perform so that its results provide novel mechanistic insights, and has to rely on their intuition about what is currently known on the topic and which experiments may add to that knowledge. We seek to make this intuition-based process more systematic, by borrowing ideas from the mature statistical field of experiment design. Towards this goal, we use the language of mathematical models to formally describe what is known about a gene’s regulatory mechanisms, and how an experiment’s results enhance that knowledge. We use information theoretic ideas to assign a ‘value’ to an experiment as well as explain objectively what is learned from that experiment. We demonstrate use of this novel approach on two extensively studied developmental genes in fruitfly. We expect our work to lead to systematic strategies for selecting the most informative experiments in a study of gene regulation.
Collapse
|
22
|
Mitani T, Yabuta Y, Ohta H, Nakamura T, Yamashiro C, Yamamoto T, Saitou M, Kurimoto K. Principles for the regulation of multiple developmental pathways by a versatile transcriptional factor, BLIMP1. Nucleic Acids Res 2017; 45:12152-12169. [PMID: 28981894 PMCID: PMC5716175 DOI: 10.1093/nar/gkx798] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 08/30/2017] [Indexed: 11/14/2022] Open
Abstract
Single transcription factors (TFs) regulate multiple developmental pathways, but the underlying mechanisms remain unclear. Here, we quantitatively characterized the genome-wide occupancy profiles of BLIMP1, a key transcriptional regulator for diverse developmental processes, during the development of three germ-layer derivatives (photoreceptor precursors, embryonic intestinal epithelium and plasmablasts) and the germ cell lineage (primordial germ cells). We identified BLIMP1-binding sites shared among multiple developmental processes, and such sites were highly occupied by BLIMP1 with a stringent recognition motif and were located predominantly in promoter proximities. A subset of bindings common to all the lineages exhibited a new, strong recognition sequence, a GGGAAA repeat. Paradoxically, however, the shared/common bindings had only a slight impact on the associated gene expression. In contrast, BLIMP1 occupied more distal sites in a cell type-specific manner; despite lower occupancy and flexible sequence recognitions, such bindings contributed effectively to the repression of the associated genes. Recognition motifs of other key TFs in BLIMP1-binding sites had little impact on the expression-level changes. These findings suggest that the shared/common sites might serve as potential reservoirs of BLIMP1 that functions at the specific sites, providing the foundation for a unified understanding of the genome regulation by BLIMP1, and, possibly, TFs in general.
Collapse
Affiliation(s)
- Tadahiro Mitani
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Yukihiro Yabuta
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hiroshi Ohta
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Tomonori Nakamura
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Chika Yamashiro
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Takuya Yamamoto
- Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan.,AMED-CREST, AMED 1-7-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, Japan
| | - Mitinori Saitou
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan.,Institute for Integrated Cell-Material Sciences, Kyoto University, Yoshida-Ushinomiya-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Kazuki Kurimoto
- Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| |
Collapse
|
23
|
Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017; 49:618-624. [PMID: 28288115 PMCID: PMC5395419 DOI: 10.1038/ng.3810] [Citation(s) in RCA: 240] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 02/13/2017] [Indexed: 12/17/2022]
Abstract
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Collapse
Affiliation(s)
- Yi-Fei Huang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Brad Gulko
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
24
|
Prosvirov KA, Mironov AA, Soldatov RA. Ten percent of conserved miRNA-binding sites in vertebrates are misaligned. Biophysics (Nagoya-shi) 2017. [DOI: 10.1134/s000635091701016x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
25
|
Deplancke B, Alpern D, Gardeux V. The Genetics of Transcription Factor DNA Binding Variation. Cell 2016; 166:538-554. [PMID: 27471964 DOI: 10.1016/j.cell.2016.07.012] [Citation(s) in RCA: 267] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Indexed: 12/23/2022]
Abstract
Most complex trait-associated variants are located in non-coding regulatory regions of the genome, where they have been shown to disrupt transcription factor (TF)-DNA binding motifs. Variable TF-DNA interactions are therefore increasingly considered as key drivers of phenotypic variation. However, recent genome-wide studies revealed that the majority of variable TF-DNA binding events are not driven by sequence alterations in the motif of the studied TF. This observation implies that the molecular mechanisms underlying TF-DNA binding variation and, by extrapolation, inter-individual phenotypic variation are more complex than originally anticipated. Here, we summarize the findings that led to this important paradigm shift and review proposed mechanisms for local, proximal, or distal genetic variation-driven variable TF-DNA binding. In addition, we discuss the biomedical implications of these findings for our ability to dissect the molecular role(s) of non-coding genetic variants in complex traits, including disease susceptibility.
Collapse
Affiliation(s)
- Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
| | - Daniel Alpern
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
26
|
Bahreini A, Levine K, Santana-Santos L, Benos PV, Wang P, Andersen C, Oesterreich S, Lee AV. Non-coding single nucleotide variants affecting estrogen receptor binding and activity. Genome Med 2016; 8:128. [PMID: 27964748 PMCID: PMC5154163 DOI: 10.1186/s13073-016-0382-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 11/23/2016] [Indexed: 11/26/2022] Open
Abstract
Background Estrogen receptor (ER) activity is critical for the development and progression of the majority of breast cancers. It is known that ER is differentially bound to DNA leading to transcriptomic and phenotypic changes in different breast cancer models. We investigated whether single nucleotide variants (SNVs) in ER binding sites (regSNVs) contribute to ER action through changes in the ER cistrome, thereby affecting disease progression. Here we developed a computational pipeline to identify SNVs in ER binding sites using chromatin immunoprecipitation sequencing (ChIP-seq) data from ER+ breast cancer models. Methods ER ChIP-seq data were downloaded from the Gene Expression Omnibus (GEO). GATK pipeline was used to identify SNVs and the MACS algorithm was employed to call DNA-binding sites. Determination of the potential effect of a given SNV in a binding site was inferred using reimplementation of the is-rSNP algorithm. The Cancer Genome Atlas (TCGA) data were integrated to correlate the regSNVs and gene expression in breast tumors. ChIP and luciferase assays were used to assess the allele-specific binding. Results Analysis of ER ChIP-seq data from MCF7 cells identified an intronic SNV in the IGF1R gene, rs62022087, predicted to increase ER binding. Functional studies confirmed that ER binds preferentially to rs62022087 versus the wild-type allele. By integrating 43 ER ChIP-seq datasets, multi-omics, and clinical data, we identified 17 regSNVs associated with altered expression of adjacent genes in ER+ disease. Of these, the top candidate was in the promoter of the GSTM1 gene and was associated with higher expression of GSTM1 in breast tumors. Survival analysis of patients with ER+ tumors revealed that higher expression of GSTM1, responsible for detoxifying carcinogens, was correlated with better outcome. Conclusions In conclusion, we have developed a computational approach that is capable of identifying putative regSNVs in ER ChIP-binding sites. These non-coding variants could potentially regulate target genes and may contribute to clinical prognosis in breast cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0382-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amir Bahreini
- Deparmtent of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA.,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA
| | - Kevin Levine
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lucas Santana-Santos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Panayiotis V Benos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Peilu Wang
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,School of Medicine, Tsinghua University, Beijing, 100084, People's Republic of China
| | - Courtney Andersen
- Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.,AstraZeneca, Oncology iMED, 35 Gatehouse Drive, Waltham, MA, USA
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA. .,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.
| | - Adrian V Lee
- Deparmtent of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA. .,Department of Pharmacology and Chemical Biology, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA. .,Womens Cancer Research Center, Magee-Women Research Institute, Pittsburgh, PA, USA.
| |
Collapse
|
27
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
28
|
Xue C, Raveendran M, Harris RA, Fawcett GL, Liu X, White S, Dahdouli M, Rio Deiros D, Below JE, Salerno W, Cox L, Fan G, Ferguson B, Horvath J, Johnson Z, Kanthaswamy S, Kubisch HM, Liu D, Platt M, Smith DG, Sun B, Vallender EJ, Wang F, Wiseman RW, Chen R, Muzny DM, Gibbs RA, Yu F, Rogers J. The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res 2016; 26:1651-1662. [PMID: 27934697 PMCID: PMC5131817 DOI: 10.1101/gr.204255.116] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 10/12/2016] [Indexed: 12/30/2022]
Abstract
Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.
Collapse
Affiliation(s)
- Cheng Xue
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - R Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gloria L Fawcett
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xiaoming Liu
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mahmoud Dahdouli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - David Rio Deiros
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jennifer E Below
- University of Texas Health Science Center, Houston, Texas 77030, USA
| | - William Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Laura Cox
- Southwest National Primate Research Center, San Antonio, Texas 78227, USA
| | - Guoping Fan
- Department of Human Genetics, University of California, Los Angeles, California 90095, USA
| | - Betsy Ferguson
- Oregon National Primate Research Center, Beaverton, Oregon 97006, USA
| | - Julie Horvath
- North Carolina Museum of Natural Sciences, Raleigh, North Carolina 27601, USA.,Biological and Biomedical Sciences, North Carolina Central University, Durham, North Carolina 27707, USA.,Department of Evolutionary Anthropology, Duke University, Durham, North Carolina 27708, USA
| | - Zach Johnson
- Yerkes National Primate Research Center, Atlanta, Georgia 30322, USA
| | - Sree Kanthaswamy
- California National Primate Research Center, Davis, California 95616, USA.,School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona 85004, USA
| | - H Michael Kubisch
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA
| | - Dahai Liu
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Michael Platt
- Department of Neurobiology, Duke University, Durham, North Carolina 27708, USA.,Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - David G Smith
- California National Primate Research Center, Davis, California 95616, USA
| | - Binghua Sun
- Center for Stem Cell and Translational Medicine, Anhui University, Anhui, China 230601
| | - Eric J Vallender
- Tulane National Primate Research Center, Covington, Louisiana 70433, USA.,New England National Primate Research Center, Southborough, Massachusetts 01772, USA
| | - Feng Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Roger W Wiseman
- Wisconsin National Primate Research Center, Madison, Wisconsin 53711, USA
| | - Rui Chen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
29
|
Hombach D, Schwarz JM, Robinson PN, Schuelke M, Seelow D. A systematic, large-scale comparison of transcription factor binding site models. BMC Genomics 2016; 17:388. [PMID: 27209209 PMCID: PMC4875604 DOI: 10.1186/s12864-016-2729-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 05/06/2016] [Indexed: 11/10/2022] Open
Abstract
Background The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2729-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniela Hombach
- Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jana Marie Schwarz
- Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Markus Schuelke
- Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany.,NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Dominik Seelow
- Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany. .,NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany. .,Berliner Institut für Gesundheitsforschung / Berlin Institute of Health, Berlin, Germany.
| |
Collapse
|
30
|
Dopazo J, Amadoz A, Bleda M, Garcia-Alonso L, Alemán A, García-García F, Rodriguez JA, Daub JT, Muntané G, Rueda A, Vela-Boza A, López-Domingo FJ, Florido JP, Arce P, Ruiz-Ferrer M, Méndez-Vidal C, Arnold TE, Spleiss O, Alvarez-Tejado M, Navarro A, Bhattacharya SS, Borrego S, Santoyo-López J, Antiñolo G. 267 Spanish Exomes Reveal Population-Specific Differences in Disease-Related Genetic Variation. Mol Biol Evol 2016; 33:1205-18. [PMID: 26764160 PMCID: PMC4839216 DOI: 10.1093/molbev/msw005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalog of local variability motivated the whole-exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one-third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including ∼10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes, and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies to distinguish real disease associations from population-specific polymorphisms.
Collapse
Affiliation(s)
- Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain Functional Genomics Node, National Institute of Bioinformatics (INB), Valencia, Spain
| | - Alicia Amadoz
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Marta Bleda
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Alejandro Alemán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics in Rare Diseases (BIER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Francisco García-García
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Juan A Rodriguez
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Josephine T Daub
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Gerard Muntané
- Institut De Biologia Evolutiva, Consejo Superior de Investigaciones Científicas - Universitat Pompeu Fabra, Barcelona, Spain
| | - Antonio Rueda
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Alicia Vela-Boza
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | | | - Javier P Florido
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Pablo Arce
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Macarena Ruiz-Ferrer
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Cristina Méndez-Vidal
- Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Todd E Arnold
- Research and Development, 454 Life Sciences, a Roche Company, Branford, CT, USA
| | - Olivia Spleiss
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Basel, Switzerland
| | | | - Arcadi Navarro
- Departament of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain Center for Genomic Regulation (CRG), Barcelona Biomedical Research Park (PRBB), Barcelona, Spain
| | - Shomi S Bhattacharya
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Andalusian Molecular Biology and Regenerative Medicine Centre (CABIMER), Sevilla, Spain
| | - Salud Borrego
- Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Javier Santoyo-López
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain
| | - Guillermo Antiñolo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Sevilla, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/Consejo Superior de Investigaciones Científicas/University of Seville, Sevilla, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| |
Collapse
|
31
|
Zimmermann MT, Oberg AL, Grill DE, Ovsyannikova IG, Haralambieva IH, Kennedy RB, Poland GA. System-Wide Associations between DNA-Methylation, Gene Expression, and Humoral Immune Response to Influenza Vaccination. PLoS One 2016; 11:e0152034. [PMID: 27031986 PMCID: PMC4816338 DOI: 10.1371/journal.pone.0152034] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 03/07/2016] [Indexed: 01/11/2023] Open
Abstract
Failure to achieve a protected state after influenza vaccination is poorly understood but occurs commonly among aged populations experiencing greater immunosenescence. In order to better understand immune response in the elderly, we studied epigenetic and transcriptomic profiles and humoral immune response outcomes in 50-74 year old healthy participants. Associations between DNA methylation and gene expression reveal a system-wide regulation of immune-relevant functions, likely playing a role in regulating a participant's propensity to respond to vaccination. Our findings show that sites of methylation regulation associated with humoral response to vaccination impact known cellular differentiation signaling and antigen presentation pathways. We performed our analysis using per-site and regionally average methylation levels, in addition to continuous or dichotomized outcome measures. The genes and molecular functions implicated by each analysis were compared, highlighting different aspects of the biologic mechanisms of immune response affected by differential methylation. Both cis-acting (within the gene or promoter) and trans-acting (enhancers and transcription factor binding sites) sites show significant associations with measures of humoral immunity. Specifically, we identified a group of CpGs that, when coordinately hypo-methylated, are associated with lower humoral immune response, and methylated with higher response. Additionally, CpGs that individually predict humoral immune responses are enriched for polycomb-group and FOXP2 transcription factor binding sites. The most robust associations implicate differential methylation affecting gene expression levels of genes with known roles in immunity (e.g. HLA-B and HLA-DQB2) and immunosenescence. We believe our data and analysis strategy highlight new and interesting epigenetic trends affecting humoral response to vaccination against influenza; one of the most common and impactful viral pathogens.
Collapse
Affiliation(s)
- Michael T. Zimmermann
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Ann L. Oberg
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Diane E. Grill
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Inna G. Ovsyannikova
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Iana H. Haralambieva
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Richard B. Kennedy
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Gregory A. Poland
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| |
Collapse
|
32
|
Identifying genetic modulators of the connectivity between transcription factors and their transcriptional targets. Proc Natl Acad Sci U S A 2016; 113:E1835-43. [PMID: 26966232 DOI: 10.1073/pnas.1517140113] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Regulation of gene expression by transcription factors (TFs) is highly dependent on genetic background and interactions with cofactors. Identifying specific context factors is a major challenge that requires new approaches. Here we show that exploiting natural variation is a potent strategy for probing functional interactions within gene regulatory networks. We developed an algorithm to identify genetic polymorphisms that modulate the regulatory connectivity between specific transcription factors and their target genes in vivo. As a proof of principle, we mapped connectivity quantitative trait loci (cQTLs) using parallel genotype and gene expression data for segregants from a cross between two strains of the yeast Saccharomyces cerevisiae We identified a nonsynonymous mutation in the DIG2 gene as a cQTL for the transcription factor Ste12p and confirmed this prediction empirically. We also identified three polymorphisms in TAF13 as putative modulators of regulation by Gcn4p. Our method has potential for revealing how genetic differences among individuals influence gene regulatory networks in any organism for which gene expression and genotype data are available along with information on binding preferences for transcription factors.
Collapse
|
33
|
Kapopoulou A, Mathew L, Wong A, Trono D, Jensen JD. The evolution of gene expression and binding specificity of the largest transcription factor family in primates. Evolution 2015; 70:167-80. [PMID: 26593440 DOI: 10.1111/evo.12819] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 11/09/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
The KRAB-containing zinc finger (KRAB-ZF) proteins represent the largest family of transcription factors (TFs) in humans, yet for the great majority, their function and specific genomic target remain unknown. However, it has been shown that a large fraction of these genes arose from segmental duplications, and that they have expanded in gene and zinc finger number throughout vertebrate evolution. To determine whether this expansion is linked to selective pressures acting on different domains, we have manually curated all KRAB-ZF genes present in the human genome together with their orthologous genes in three closely related species and assessed the evolutionary forces acting at the sequence level as well as on their expression profiles. We provide evidence that KRAB-ZFs can be separated into two categories according to the polymorphism present in their DNA-contacting residues. Those carrying a nonsynonymous single nucleotide polymorphism (SNP) in their DNA-contacting amino acids exhibit significantly reduced expression in all tissues, have emerged in a recent lineage, and seem to be less strongly constrained evolutionarily than those without such a polymorphism. This work provides evidence for a link between age of the TF, as well as polymorphism in their DNA-contacting residues and expression levels-both of which may be jointly affected by selection.
Collapse
Affiliation(s)
- Adamandia Kapopoulou
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland, 1015
| | - Lisha Mathew
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland, 1015
| | - Alex Wong
- Department of Biology, Carleton University, Ottawa, Canada, K15 5B6
| | - Didier Trono
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland, 1015.
| |
Collapse
|
34
|
Maurano MT, Haugen E, Sandstrom R, Vierstra J, Shafer A, Kaul R, Stamatoyannopoulos JA. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet 2015; 47:1393-401. [PMID: 26502339 PMCID: PMC4666772 DOI: 10.1038/ng.3432] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/02/2015] [Indexed: 12/18/2022]
Abstract
The function of human regulatory regions depends exquisitely on their local genomic environment and on cellular context, complicating experimental analysis of common disease- and trait-associated variants that localize within regulatory DNA. We use allelically resolved genomic DNase I footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly influence transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enables systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo. We leverage this analysis to develop accurate models of variation affecting the recognition sites for diverse transcription factors and apply these models to discriminate nearly 500,000 common regulatory variants likely to affect transcription factor occupancy across the human genome. The approach and results provide a new foundation for the analysis and interpretation of noncoding variation in complete human genomes and for systems-level investigation of disease-associated variants.
Collapse
Affiliation(s)
- Matthew T Maurano
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Anthony Shafer
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Rajinder Kaul
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - John A Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Division of Oncology, Department of Medicine, University of Washington, Seattle, Washington, USA
- Altius Institute for Biomedical Sciences, Seattle, Washington, USA
| |
Collapse
|
35
|
Payne JL, Wagner A. Mechanisms of mutational robustness in transcriptional regulation. Front Genet 2015; 6:322. [PMID: 26579194 PMCID: PMC4621482 DOI: 10.3389/fgene.2015.00322] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 10/10/2015] [Indexed: 12/17/2022] Open
Abstract
Robustness is the invariance of a phenotype in the face of environmental or genetic change. The phenotypes produced by transcriptional regulatory circuits are gene expression patterns that are to some extent robust to mutations. Here we review several causes of this robustness. They include robustness of individual transcription factor binding sites, homotypic clusters of such sites, redundant enhancers, transcription factors, redundant transcription factors, and the wiring of transcriptional regulatory circuits. Such robustness can either be an adaptation by itself, a byproduct of other adaptations, or the result of biophysical principles and non-adaptive forces of genome evolution. The potential consequences of such robustness include complex regulatory network topologies that arise through neutral evolution, as well as cryptic variation, i.e., genotypic divergence without phenotypic divergence. On the longest evolutionary timescales, the robustness of transcriptional regulation has helped shape life as we know it, by facilitating evolutionary innovations that helped organisms such as flowering plants and vertebrates diversify.
Collapse
Affiliation(s)
- Joshua L Payne
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich Zurich, Switzerland ; Swiss Institute of Bioinformatics Lausanne, Switzerland
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich Zurich, Switzerland ; Swiss Institute of Bioinformatics Lausanne, Switzerland ; The Santa Fe Institute Santa Fe, NM, USA
| |
Collapse
|
36
|
Hancock DB, Levy JL, Gaddis NC, Glasheen C, Saccone NL, Page GP, Hulse GK, Wildenauer D, Kelty EA, Schwab SG, Degenhardt L, Martin NG, Montgomery GW, Attia J, Holliday EG, McEvoy M, Scott RJ, Bierut LJ, Nelson EC, Kral AH, Johnson EO. Cis-Expression Quantitative Trait Loci Mapping Reveals Replicable Associations with Heroin Addiction in OPRM1. Biol Psychiatry 2015; 78:474-84. [PMID: 25744370 PMCID: PMC4519434 DOI: 10.1016/j.biopsych.2015.01.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 12/18/2014] [Accepted: 01/08/2015] [Indexed: 12/25/2022]
Abstract
BACKGROUND No opioid receptor, mu 1 (OPRM1) gene polymorphisms, including the functional single nucleotide polymorphism (SNP) rs1799971, have been conclusively associated with heroin/other opioid addiction, despite their biological plausibility. We used evidence of polymorphisms altering OPRM1 expression in normal human brain tissue to nominate and then test associations with heroin addiction. METHODS We tested 103 OPRM1 SNPs for association with OPRM1 messenger RNA expression in prefrontal cortex from 224 European Americans and African Americans of the BrainCloud cohort. We then tested the 16 putative cis-expression quantitative trait loci (cis-eQTL) SNPs for association with heroin addiction in the Urban Health Study and two replication cohorts, totaling 16,729 European Americans, African Americans, and Australians of European ancestry. RESULTS Four putative cis-eQTL SNPs were significantly associated with heroin addiction in the Urban Health Study (smallest p = 8.9 × 10(-5)): rs9478495, rs3778150, rs9384169, and rs562859. Rs3778150, located in OPRM1 intron 1, was significantly replicated (p = 6.3 × 10(-5)). Meta-analysis across all case-control cohorts resulted in p = 4.3 × 10(-8): the rs3778150-C allele (frequency = 16%-19%) being associated with increased heroin addiction risk. Importantly, the functional SNP allele rs1799971-A was associated with heroin addiction only in the presence of rs3778150-C (p = 1.48 × 10(-6) for rs1799971-A/rs3778150-C and p = .79 for rs1799971-A/rs3778150-T haplotypes). Lastly, replication was observed for six other intron 1 SNPs that had prior suggestive associations with heroin addiction (smallest p = 2.7 × 10(-8) for rs3823010). CONCLUSIONS Our findings show that common OPRM1 intron 1 SNPs have replicable associations with heroin addiction. The haplotype structure of rs3778150 and nearby SNPs may underlie the inconsistent associations between rs1799971 and heroin addiction.
Collapse
Affiliation(s)
- Dana B Hancock
- Behavioral Health Epidemiology Program, Behavioral Health and Criminal Justice Division, Research Triangle Institute (RTI) International, St. Louis, Missouri..
| | - Joshua L Levy
- Research Computing Division, RTI International, Research Triangle Park, North Carolina, St. Louis, Missouri
| | - Nathan C Gaddis
- Research Computing Division, RTI International, Research Triangle Park, North Carolina, St. Louis, Missouri
| | - Cristie Glasheen
- Behavioral Health Epidemiology Program, Behavioral Health and Criminal Justice Division, Research Triangle Institute (RTI) International, St. Louis, Missouri
| | - Nancy L Saccone
- Department of Genetics, Washington University in St. Louis, St. Louis, Missouri
| | - Grier P Page
- Center for Public Health Genomics, RTI International, Atlanta, Georgia
| | - Gary K Hulse
- School of Psychiatry and Clinical Neurosciences, University of Western Australia, Crawley, Western Australia, Australia
| | - Dieter Wildenauer
- School of Psychiatry and Clinical Neurosciences, University of Western Australia, Crawley, Western Australia, Australia
| | - Erin A Kelty
- School of Psychiatry and Clinical Neurosciences, University of Western Australia, Crawley, Western Australia, Australia
| | - Sibylle G Schwab
- Department of Psychiatry and Psychotherapy, University of Erlangen-Nuremberg, Erlangen, Germany.; Faculty of Science, Medicine, and Health, University of Wollongong, Wollongong, New South Wales
| | - Louisa Degenhardt
- National Drug and Alcohol Research Centre, University of New South Wales, Sydney
| | - Nicholas G Martin
- Queensland Institute of Medical Research Berghofer Medical Research Institute, Brisbane, Queensland
| | - Grant W Montgomery
- Queensland Institute of Medical Research Berghofer Medical Research Institute, Brisbane, Queensland
| | - John Attia
- Centre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales.; Clinical Research Design, IT and Statistical Support Unit, Hunter Medical Research Institute, Newcastle, New South Wales
| | - Elizabeth G Holliday
- Centre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales.; Clinical Research Design, IT and Statistical Support Unit, Hunter Medical Research Institute, Newcastle, New South Wales
| | - Mark McEvoy
- Centre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales.; Public Health Research Program, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, Newcastle, New South Wales
| | - Rodney J Scott
- Center for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, Newcastle, New South Wales.; School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, New South Wales.; Division of Genetics, Hunter Area Pathology Service, Newcastle, New South Wales, Australia
| | - Laura J Bierut
- Department of Psychiatry, Washington University in St. Louis, St. Louis, Missouri
| | - Elliot C Nelson
- Department of Psychiatry, Washington University in St. Louis, St. Louis, Missouri
| | - Alex H Kral
- Urban Health Program, Behavioral Health and Criminal Justice Division, RTI International, San Francisco, California
| | - Eric O Johnson
- Fellow Program and Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, North Carolina
| |
Collapse
|
37
|
Flores Saiffe Farías A, Jaime Herrera López E, Moreno Vázquez CJ, Li W, Prado Montes de Oca E. Predicting functional regulatory SNPs in the human antimicrobial peptide genes DEFB1 and CAMP in tuberculosis and HIV/AIDS. Comput Biol Chem 2015; 59 Pt A:117-25. [PMID: 26447748 DOI: 10.1016/j.compbiolchem.2015.09.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 09/03/2015] [Accepted: 09/04/2015] [Indexed: 01/04/2023]
Abstract
Single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) within gene promoter region or enhancers can modify the transcription rate of genes related to complex diseases. These SNPs can be called regulatory SNPs (rSNPs). Data compiled from recent projects, such as the 1000 Genomes Project and ENCODE, has revealed essential information used to perform in silico prediction of the molecular and biological repercussions of SNPs within TFBS. However, most of these studies are very limited, as they only analyze SNPs in coding regions or when applied to promoters, and do not integrate essential biological data like TFBSs, expression profiles, pathway analysis, homotypic redundancy (number of TFBSs for the same TF in a region), chromatin accessibility and others, which could lead to a more accurate prediction. Our aim was to integrate different data in a biologically coherent method to analyze the proximal promoter regions of two antimicrobial peptide genes, DEFB1 and CAMP, that are associated with tuberculosis (TB) and HIV/AIDS. We predicted SNPs within the promoter regions that are more likely to interact with transcription factors (TFs). We also assessed the impact of homotypic redundancy using a novel approach called the homotypic redundancy weight factor (HWF). Our results identified 10 SNPs, which putatively modify the binding affinity of 24 TFs previously identified as related to TB and HIV/AIDS expression profiles (e.g. KLF5, CEBPA and NFKB1 for TB; FOXP2, BRCA1, CEBPB, CREB1, EBF1 and ZNF354C for HIV/AIDS; and RUNX2, HIF1A, JUN/AP-1, NR4A2, EGR1 for both diseases). Validating with the OregAnno database and cell-specific functional/non functional SNPs from additional 13 genes, our algorithm performed 53% sensitivity and 84.6% specificity to detect functional rSNPs using the DNAseI-HUP database. We are proposing our algorithm as a novel in silico method to detect true functional rSNPs in antimicrobial peptide genes. With further improvement, this novel method could be applied to other promoters in order to design probes and to discover new drug targets for complex diseases.
Collapse
Affiliation(s)
- Adolfo Flores Saiffe Farías
- Personalized Medicine Laboratory (LAMPER), Medical and Pharmaceutical Biotechnology, Guadalajara Unit, Research Center of Technology and Design Assistance of Jalisco State, National Council of Science and Technology (CIATEJ AC, CONACYT), Av. Normalistas 800, Col. Colinas de la Normal, CP 44270 Guadalajara, Jalisco, Mexico.
| | - Enrique Jaime Herrera López
- Industrial Biotechnology, CIATEJ AC, Zapopan Unit, CONACYT, Camino Arenero 1227, Col. El Bajío del Arenal, CP 45019 Zapopan, Jalisco, Mexico.
| | - Cristopher Jorge Moreno Vázquez
- Personalized Medicine Laboratory (LAMPER), Medical and Pharmaceutical Biotechnology, Guadalajara Unit, Research Center of Technology and Design Assistance of Jalisco State, National Council of Science and Technology (CIATEJ AC, CONACYT), Av. Normalistas 800, Col. Colinas de la Normal, CP 44270 Guadalajara, Jalisco, Mexico.
| | - Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, 350 Community Dr. Manhasset, NY 11030, USA.
| | - Ernesto Prado Montes de Oca
- Personalized Medicine Laboratory (LAMPER), Medical and Pharmaceutical Biotechnology, Guadalajara Unit, Research Center of Technology and Design Assistance of Jalisco State, National Council of Science and Technology (CIATEJ AC, CONACYT), Av. Normalistas 800, Col. Colinas de la Normal, CP 44270 Guadalajara, Jalisco, Mexico; Molecular Biology Laboratory, Biosafety Area, Medical and Pharmaceutical Biotechnology, Guadalajara Unit, CIATEJ AC, CONACYT, Av. Normalistas 800, Col. Colinas de la Normal, CP 44270 Guadalajara, Jalisco, Mexico.
| |
Collapse
|
38
|
Fan Y, Zhang Y, Xu S, Kong N, Zhou Y, Ren Z, Deng Y, Lin L, Ren Y, Wang Q, Zi J, Wen B, Liu S. Insights from ENCODE on Missing Proteins: Why β-Defensin Expression Is Scarcely Detected. J Proteome Res 2015; 14:3635-44. [PMID: 26258396 DOI: 10.1021/acs.jproteome.5b00565] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
β-Defensins (DEFBs) have a variety of functions. The majority of these proteins were not identified in a recent proteome survey. Neither protein detection nor the analysis of transcriptomic data based on RNA-seq data for three liver cancer cell lines identified any expression products. Extensive investigation into DEFB transcripts in over 70 cell lines offered similar results. This fact naturally begs the question—Why are DEFB genes scarcely expressed? After examining DEFB gene annotation and the physicochemical properties of its protein products, we postulated that regulatory elements could play a key role in the resultant poor transcription of DEFB genes. Four regions containing DEFB genes and six adjacent regions on chromosomes 6, 8, and 20 were carefully investigated using The Encyclopedia of DNA Elements (ENCODE) information, such as that of DNase I hypersensitive sites (DHSs), transcription factors (TFs), and histone modifications. The results revealed that the intensities of these ENCODE features were globally weaker than those in the adjacent regions. Impressively, DEFB-related regions on chromosomes 6 and 8 containing several non-DEFB genes had lower ENCODE feature intensities, indicating that the absence of DEFB mRNAs might not depend on the gene family but may be reliant upon gene location and chromatin structure.
Collapse
Affiliation(s)
- Yang Fan
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Yue Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Shaohang Xu
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Nannan Kong
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Yang Zhou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Zhe Ren
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Yamei Deng
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Liang Lin
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Yan Ren
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Quanhui Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| | - Jin Zi
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Bo Wen
- BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Siqi Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , No 1, Beichen West Road, Beijing 100101, China.,BGI-Shenzhen , Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.,Graduate University of the Chinese Academy of Sciences , 19A, Yuquan Road, Beijing 100049, China
| |
Collapse
|
39
|
Rockowitz S, Zheng D. Significant expansion of the REST/NRSF cistrome in human versus mouse embryonic stem cells: potential implications for neural development. Nucleic Acids Res 2015; 43:5730-43. [PMID: 25990720 PMCID: PMC4499139 DOI: 10.1093/nar/gkv514] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Revised: 04/30/2015] [Accepted: 05/05/2015] [Indexed: 11/14/2022] Open
Abstract
Recent studies have employed cross-species comparisons of transcription factor binding, reporting significant regulatory network 'rewiring' between species. Here, we address how a transcriptional repressor targets and regulates neural genes differentially between human and mouse embryonic stem cells (ESCs). We find that the transcription factor, Repressor Element 1 Silencing Transcription factor (REST; also called neuron restrictive silencer factor) binds to a core group of ∼1200 syntenic genomic regions in both species, with these conserved sites highly enriched with co-factors, selective histone modifications and DNA hypomethylation. Genes with conserved REST binding are enriched with neural functions and more likely to be upregulated upon REST depletion. Interestingly, we identified twice as many REST peaks in human ESCs compared to mouse ESCs. Human REST cistrome expansion involves additional peaks in genes targeted by REST in both species and human-specific gene targets. Genes with expanded REST occupancy in humans are enriched for learning or memory functions. Analysis of neurological disorder associated genes reveals that Amyotrophic Lateral Sclerosis and oxidative stress genes are particularly enriched with human-specific REST binding. Overall, our results demonstrate that there is substantial rewiring of human and mouse REST cistromes, and that REST may have human-specific roles in brain development and functions.
Collapse
Affiliation(s)
- Shira Rockowitz
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Deyou Zheng
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA Department of Neurology, Albert Einstein College of Medicine, Bronx, NY 10461, USA Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
40
|
Mathelier A, Lefebvre C, Zhang AW, Arenillas DJ, Ding J, Wasserman WW, Shah SP. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biol 2015; 16:84. [PMID: 25903198 PMCID: PMC4467049 DOI: 10.1186/s13059-015-0648-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 04/07/2015] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. RESULTS We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. CONCLUSIONS Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Collapse
Affiliation(s)
- Anthony Mathelier
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Calvin Lefebvre
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, V5Z 1L3, BC, Canada.
| | - Allen W Zhang
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, V5Z 1L3, BC, Canada.
| | - David J Arenillas
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Jiarui Ding
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Department of Computer Science, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada.
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, 950 West 28th Avenue, V5Z 4H4, Vancouver, BC, Canada.
| | - Sohrab P Shah
- Department of Molecular Oncology, British Columbia Cancer Agency, Vancouver, V5Z 1L3, BC, Canada. .,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, G227-2211, BC, Canada.
| |
Collapse
|
41
|
Mathelier A, Shi W, Wasserman WW. Identification of altered cis-regulatory elements in human disease. Trends Genet 2015; 31:67-76. [DOI: 10.1016/j.tig.2014.12.003] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 12/19/2014] [Accepted: 12/19/2014] [Indexed: 02/01/2023]
|
42
|
Chadwick LH, Sawa A, Yang IV, Baccarelli A, Breakefield XO, Deng HW, Dolinoy DC, Fallin MD, Holland NT, Houseman EA, Lomvardas S, Rao M, Satterlee JS, Tyson FL, Vijayanand P, Greally JM. New insights and updated guidelines for epigenome-wide association studies. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.nepig.2014.10.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
43
|
Schoborg T, Labrador M. Expanding the roles of chromatin insulators in nuclear architecture, chromatin organization and genome function. Cell Mol Life Sci 2014; 71:4089-113. [PMID: 25012699 PMCID: PMC11113341 DOI: 10.1007/s00018-014-1672-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 05/31/2014] [Accepted: 06/23/2014] [Indexed: 01/08/2023]
Abstract
Of the numerous classes of elements involved in modulating eukaryotic chromosome structure and function, chromatin insulators arguably remain the most poorly understood in their contribution to these processes in vivo. Indeed, our view of chromatin insulators has evolved dramatically since their chromatin boundary and enhancer blocking properties were elucidated roughly a quarter of a century ago as a result of recent genome-wide, high-throughput methods better suited to probing the role of these elements in their native genomic contexts. The overall theme that has emerged from these studies is that chromatin insulators function as general facilitators of higher-order chromatin loop structures that exert both physical and functional constraints on the genome. In this review, we summarize the result of recent work that supports this idea as well as a number of other studies linking these elements to a diverse array of nuclear processes, suggesting that chromatin insulators exert master control over genome organization and behavior.
Collapse
Affiliation(s)
- Todd Schoborg
- Department of Biochemistry, Cellular and Molecular Biology, The University of Tennessee, M407 Walters Life Sciences, 1414 Cumberland Avenue, Knoxville, TN 37996 USA
- Present Address: Laboratory of Molecular Machines and Tissue Architecture, Cell Biology and Physiology Center, National Heart, Lung and Blood Institute, National Institutes of Health, 50 South Dr Rm 2122, Bethesda, MD 20892 USA
| | - Mariano Labrador
- Department of Biochemistry, Cellular and Molecular Biology, The University of Tennessee, M407 Walters Life Sciences, 1414 Cumberland Avenue, Knoxville, TN 37996 USA
| |
Collapse
|
44
|
Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ 2014; 2:e639. [PMID: 25374782 PMCID: PMC4217187 DOI: 10.7717/peerj.639] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 10/07/2014] [Indexed: 11/20/2022] Open
Abstract
We present a method to assist in interpretation of the functional impact of intergenic disease-associated SNPs that is not limited to search strategies proximal to the SNP. The method builds on two sources of external knowledge: the growing understanding of three-dimensional spatial relationships in the genome, and the substantial repository of information about relationships among genetic variants, genes, and diseases captured in the published biomedical literature. We integrate chromatin conformation capture data (HiC) with literature support to rank putative target genes of intergenic disease-associated SNPs. We demonstrate that this hybrid method outperforms a genomic distance baseline on a small test set of expression quantitative trait loci, as well as either method individually. In addition, we show the potential for this method to uncover relationships between intergenic SNPs and target genes across chromosomes. With more extensive chromatin conformation capture data becoming readily available, this method provides a way forward towards functional interpretation of SNPs in the context of the three dimensional structure of the genome in the nucleus.
Collapse
Affiliation(s)
- Geoff Macintyre
- Department of Computing and Information Systems, The University of Melbourne, VIC, Australia
- Centre for Neural Engineering, The University of Melbourne, VIC, Australia
| | - Antonio Jimeno Yepes
- Department of Computing and Information Systems, The University of Melbourne, VIC, Australia
| | - Cheng Soon Ong
- Department of Electrical and Electronic Engineering, The University of Melbourne, VIC, Australia
- Machine Learning Group, NICTA Canberra Research Laboratory, Australia
- Research School of Computer Science, Australian National University, Australia
| | - Karin Verspoor
- Department of Computing and Information Systems, The University of Melbourne, VIC, Australia
- Health and Biomedical Informatics Centre, The University of Melbourne, VIC, Australia
| |
Collapse
|
45
|
Abstract
Motivation: The Expectation–Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored. Results: We present MITSU (Motif discovery by ITerative Sampling and Updating), a novel algorithm for motif discovery, which combines sEM with an improved approximation to the likelihood function, which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterized prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value. Availability and implementation: Java executable available for download at http://www.sourceforge.net/p/mitsu-motif/, supported on Linux/OS X. Contact:a.m.kilpatrick@sms.ed.ac.uk
Collapse
Affiliation(s)
- Alastair M Kilpatrick
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Bruce Ward
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Stuart Aitken
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| |
Collapse
|
46
|
Garcia-Alonso L, Jiménez-Almazán J, Carbonell-Caballero J, Vela-Boza A, Santoyo-López J, Antiñolo G, Dopazo J. The role of the interactome in the maintenance of deleterious variability in human populations. Mol Syst Biol 2014; 10:752. [PMID: 25261458 PMCID: PMC4299661 DOI: 10.15252/msb.20145222] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Revised: 08/23/2014] [Accepted: 08/28/2014] [Indexed: 12/25/2022] Open
Abstract
Recent genomic projects have revealed the existence of an unexpectedly large amount of deleterious variability in the human genome. Several hypotheses have been proposed to explain such an apparently high mutational load. However, the mechanisms by which deleterious mutations in some genes cause a pathological effect but are apparently innocuous in other genes remain largely unknown. This study searched for deleterious variants in the 1,000 genomes populations, as well as in a newly sequenced population of 252 healthy Spanish individuals. In addition, variants causative of monogenic diseases and somatic variants from 41 chronic lymphocytic leukaemia patients were analysed. The deleterious variants found were analysed in the context of the interactome to understand the role of network topology in the maintenance of the observed mutational load. Our results suggest that one of the mechanisms whereby the effect of these deleterious variants on the phenotype is suppressed could be related to the configuration of the protein interaction network. Most of the deleterious variants observed in healthy individuals are concentrated in peripheral regions of the interactome, in combinations that preserve their connectivity, and have a marginal effect on interactome integrity. On the contrary, likely pathogenic cancer somatic deleterious variants tend to occur in internal regions of the interactome, often with associated structural consequences. Finally, variants causative of monogenic diseases seem to occupy an intermediate position. Our observations suggest that the real pathological potential of a variant might be more a systems property rather than an intrinsic property of individual proteins.
Collapse
Affiliation(s)
- Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Jorge Jiménez-Almazán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Jose Carbonell-Caballero
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Alicia Vela-Boza
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Javier Santoyo-López
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Guillermo Antiñolo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocio/Consejo Superior de Investigaciones Científicas/University of Seville, Seville, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Seville, Spain
| | - Joaquin Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Functional Genomics Node, (INB) at CIPF, Valencia, Spain
| |
Collapse
|
47
|
Hurst LD, Sachenkova O, Daub C, Forrest ARR, Huminiecki L. A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators. Genome Biol 2014; 15:413. [PMID: 25079787 PMCID: PMC4310617 DOI: 10.1186/s13059-014-0413-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 07/15/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Conventional wisdom holds that, owing to the dominance of features such as chromatin level control, the expression of a gene cannot be readily predicted from knowledge of promoter architecture. This is reflected, for example, in a weak or absent correlation between promoter divergence and expression divergence between paralogs. However, an inability to predict may reflect an inability to accurately measure or employment of the wrong parameters. Here we address this issue through integration of two exceptional resources: ENCODE data on transcription factor binding and the FANTOM5 high-resolution expression atlas. RESULTS Consistent with the notion that in eukaryotes most transcription factors are activating, the number of transcription factors binding a promoter is a strong predictor of expression breadth. In addition, evolutionarily young duplicates have fewer transcription factor binders and narrower expression. Nonetheless, we find several binders and cooperative sets that are disproportionately associated with broad expression, indicating that models more complex than simple correlations should hold more predictive power. Indeed, a machine learning approach improves fit to the data compared with a simple correlation. Machine learning could at best moderately predict tissue of expression of tissue specific genes. CONCLUSIONS We find robust evidence that some expression parameters and paralog expression divergence are strongly predictable with knowledge of transcription factor binding repertoire. While some cooperative complexes can be identified, consistent with the notion that most eukaryotic transcription factors are activating, a simple predictor, the number of binding transcription factors found on a promoter, is a robust predictor of expression breadth.
Collapse
Affiliation(s)
- Laurence D Hurst
- />Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY UK
| | - Oxana Sachenkova
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
| | - Carsten Daub
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
| | - Alistair RR Forrest
- />RIKEN Omics Science Center, Yokohama, Japan
- />Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa Japan
| | - the FANTOM consortium
- />Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY UK
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
- />RIKEN Omics Science Center, Yokohama, Japan
- />Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- />BILS bioinformatics infrastructure for life sciences, Stockholm, Sweden
- />Department of Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden
- />Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa Japan
| | - Lukasz Huminiecki
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
- />Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- />BILS bioinformatics infrastructure for life sciences, Stockholm, Sweden
- />Department of Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
48
|
De Silva DR, Nichols R, Elgar G. Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences. PLoS One 2014; 9:e103357. [PMID: 25062004 PMCID: PMC4111549 DOI: 10.1371/journal.pone.0103357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 07/01/2014] [Indexed: 12/30/2022] Open
Abstract
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans--higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.
Collapse
Affiliation(s)
- Dilrini R. De Silva
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Richard Nichols
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Greg Elgar
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
| |
Collapse
|
49
|
Chen CY, Chang IS, Hsiung CA, Wasserman WW. On the identification of potential regulatory variants within genome wide association candidate SNP sets. BMC Med Genomics 2014; 7:34. [PMID: 24920305 PMCID: PMC4066296 DOI: 10.1186/1755-8794-7-34] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 06/02/2014] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed. METHODS We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference. RESULTS The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites. CONCLUSION Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.
Collapse
Affiliation(s)
- Chih-yu Chen
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, Canada
| | - I-Shou Chang
- National Institute of Cancer Research, National Health Research Institutes, Zhunan, Taiwan
| | - Chao A Hsiung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
50
|
Abstract
Transcription factor binding sites (TFBSs) on the DNA are generally accepted as the key nodes of gene control. However, the multitudes of TFBSs identified in genome-wide studies, some of them seemingly unconstrained in evolution, have prompted the view that in many cases TF binding may serve no biological function. Yet, insights from transcriptional biochemistry, population genetics and functional genomics suggest that rather than segregating into 'functional' or 'non-functional', TFBS inputs to their target genes may be generally cumulative, with varying degrees of potency and redundancy. As TFBS redundancy can be diminished by mutations and environmental stress, some of the apparently 'spurious' sites may turn out to be important for maintaining adequate transcriptional regulation under these conditions. This has significant implications for interpreting the phenotypic effects of TFBS mutations, particularly in the context of genome-wide association studies for complex traits.
Collapse
|