151
|
Corona RI, Guo JT. Statistical analysis of structural determinants for protein-DNA-binding specificity. Proteins 2016; 84:1147-61. [PMID: 27147539 DOI: 10.1002/prot.25061] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 04/21/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
DNA-binding proteins play critical roles in biological processes including gene expression, DNA packaging and DNA repair. They bind to DNA target sequences with different degrees of binding specificity, ranging from highly specific (HS) to nonspecific (NS). Alterations of DNA-binding specificity, due to either genetic variation or somatic mutations, can lead to various diseases. In this study, a comparative analysis of protein-DNA complex structures was carried out to investigate the structural features that contribute to binding specificity. Protein-DNA complexes were grouped into three general classes based on degrees of binding specificity: HS, multispecific (MS), and NS. Our results show a clear trend of structural features among the three classes, including amino acid binding propensities, simple and complex hydrogen bonds, major/minor groove and base contacts, and DNA shape. We found that aspartate is enriched in HS DNA binding proteins and predominately binds to a cytosine through a single hydrogen bond or two consecutive cytosines through bidentate hydrogen bonds. Aromatic residues, histidine and tyrosine, are highly enriched in the HS and MS groups and may contribute to specific binding through different mechanisms. To further investigate the role of protein flexibility in specific protein-DNA recognition, we analyzed the conformational changes between the bound and unbound states of DNA-binding proteins and structural variations. The results indicate that HS and MS DNA-binding domains have larger conformational changes upon DNA-binding and larger degree of flexibility in both bound and unbound states. Proteins 2016; 84:1147-1161. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rosario I Corona
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| |
Collapse
|
152
|
Siebert M, Söding J. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res 2016; 44:6055-69. [PMID: 27288444 PMCID: PMC5291271 DOI: 10.1093/nar/gkw521] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 05/29/2016] [Indexed: 01/01/2023] Open
Abstract
Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as priors for those of order k This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P = 1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26-101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs.
Collapse
Affiliation(s)
- Matthias Siebert
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany Gene Center, Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
153
|
Dror I, Rohs R, Mandel-Gutfreund Y. How motif environment influences transcription factor search dynamics: Finding a needle in a haystack. Bioessays 2016; 38:605-12. [PMID: 27192961 PMCID: PMC5023137 DOI: 10.1002/bies.201600005] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Transcription factors (TFs) have to find their binding sites, which are distributed throughout the genome. Facilitated diffusion is currently the most widely accepted model for this search process. Based on this model the TF alternates between one-dimensional sliding along the DNA, and three-dimensional bulk diffusion. In this view, the non-specific associations between the proteins and the DNA play a major role in the search dynamics. However, little is known about how the DNA properties around the motif contribute to the search. Accumulating evidence showing that TF binding sites are embedded within a unique environment, specific to each TF, leads to the hypothesis that the search process is facilitated by favorable DNA features that help to improve the search efficiency. Here, we review the field and present the hypothesis that TF-DNA recognition is dictated not only by the motif, but is also influenced by the environment in which the motif resides.
Collapse
Affiliation(s)
- Iris Dror
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa, Israel.,Departments of Biological Sciences, Chemistry, Physics, and Computer Science, Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
| | - Remo Rohs
- Departments of Biological Sciences, Chemistry, Physics, and Computer Science, Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa, Israel
| |
Collapse
|
154
|
Dans PD, Walther J, Gómez H, Orozco M. Multiscale simulation of DNA. Curr Opin Struct Biol 2016; 37:29-45. [DOI: 10.1016/j.sbi.2015.11.011] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 11/23/2015] [Accepted: 11/25/2015] [Indexed: 01/05/2023]
|
155
|
Estrada J, Ruiz-Herrero T, Scholes C, Wunderlich Z, DePace AH. SiteOut: An Online Tool to Design Binding Site-Free DNA Sequences. PLoS One 2016; 11:e0151740. [PMID: 26987123 PMCID: PMC4795680 DOI: 10.1371/journal.pone.0151740] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/03/2016] [Indexed: 11/18/2022] Open
Abstract
DNA-binding proteins control many fundamental biological processes such as transcription, recombination and replication. A major goal is to decipher the role that DNA sequence plays in orchestrating the binding and activity of such regulatory proteins. To address this goal, it is useful to rationally design DNA sequences with desired numbers, affinities and arrangements of protein binding sites. However, removing binding sites from DNA is computationally non-trivial since one risks creating new sites in the process of deleting or moving others. Here we present an online binding site removal tool, SiteOut, that enables users to design arbitrary DNA sequences that entirely lack binding sites for factors of interest. SiteOut can also be used to delete sites from a specific sequence, or to introduce site-free spacers between functional sequences without creating new sites at the junctions. In combination with commercial DNA synthesis services, SiteOut provides a powerful and flexible platform for synthetic projects that interrogate regulatory DNA. Here we describe the algorithm and illustrate the ways in which SiteOut can be used; it is publicly available at https://depace.med.harvard.edu/siteout/.
Collapse
Affiliation(s)
- Javier Estrada
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States of America
| | - Teresa Ruiz-Herrero
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States of America
| | - Clarissa Scholes
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States of America
| | - Zeba Wunderlich
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States of America
| | - Angela H. DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States of America
- * E-mail:
| |
Collapse
|
156
|
Al-Zyoud WA, Hynson RMG, Ganuelas LA, Coster ACF, Duff AP, Baker MAB, Stewart AG, Giannoulatou E, Ho JWK, Gaus K, Liu D, Lee LK, Böcking T. Binding of transcription factor GabR to DNA requires recognition of DNA shape at a location distinct from its cognate binding site. Nucleic Acids Res 2016; 44:1411-20. [PMID: 26681693 PMCID: PMC4756830 DOI: 10.1093/nar/gkv1466] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 11/28/2015] [Accepted: 11/30/2015] [Indexed: 12/12/2022] Open
Abstract
Mechanisms for transcription factor recognition of specific DNA base sequences are well characterized and recent studies demonstrate that the shape of these cognate binding sites is also important. Here, we uncover a new mechanism where the transcription factor GabR simultaneously recognizes two cognate binding sites and the shape of a 29 bp DNA sequence that bridges these sites. Small-angle X-ray scattering and multi-angle laser light scattering are consistent with a model where the DNA undergoes a conformational change to bend around GabR during binding. In silico predictions suggest that the bridging DNA sequence is likely to be bendable in one direction and kinetic analysis of mutant DNA sequences with biolayer interferometry, allowed the independent quantification of the relative contribution of DNA base and shape recognition in the GabR-DNA interaction. These indicate that the two cognate binding sites as well as the bendability of the DNA sequence in between these sites are required to form a stable complex. The mechanism of GabR-DNA interaction provides an example where the correct shape of DNA, at a clearly distinct location from the cognate binding site, is required for transcription factor binding and has implications for bioinformatics searches for novel binding sites.
Collapse
MESH Headings
- Bacillus subtilis/genetics
- Bacillus subtilis/metabolism
- Bacterial Proteins/chemistry
- Bacterial Proteins/metabolism
- Base Sequence
- Binding Sites/genetics
- Chromatography, Gel
- DNA, Bacterial/chemistry
- DNA, Bacterial/genetics
- DNA, Bacterial/metabolism
- Gene Expression Regulation, Bacterial
- Models, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- Operon/genetics
- Promoter Regions, Genetic/genetics
- Protein Binding
- Protein Multimerization
- Protein Structure, Tertiary
- Scattering, Small Angle
- Sequence Homology, Nucleic Acid
- Transcription Factors/chemistry
- Transcription Factors/genetics
- Transcription Factors/metabolism
- X-Ray Diffraction
Collapse
Affiliation(s)
- Walid A Al-Zyoud
- School of Medical Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Robert M G Hynson
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Lorraine A Ganuelas
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Adelle C F Coster
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW 2052, Australia
| | - Anthony P Duff
- Australian Nuclear Science and Technology Organisation, Lucas Heights, NSW 2234, Australia
| | - Matthew A B Baker
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Alastair G Stewart
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Eleni Giannoulatou
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Joshua W K Ho
- The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia
| | - Katharina Gaus
- School of Medical Sciences, The University of New South Wales, Sydney, NSW 2052, Australia EMBL Australia Node for Single Molecule Science, The University of New South Wales, Corner Botany and High Street, Kensington Campus 2052, NSW 2052, Australia
| | - Dali Liu
- Department of Chemistry and Biochemistry, Loyola University, Chicago, IL 60660, USA
| | - Lawrence K Lee
- School of Medical Sciences, The University of New South Wales, Sydney, NSW 2052, Australia The Victor Chang Cardiac Research Institute, 405 Liverpool St Darlinghurst, Darlinghurst, NSW 2010, Australia EMBL Australia Node for Single Molecule Science, The University of New South Wales, Corner Botany and High Street, Kensington Campus 2052, NSW 2052, Australia
| | - Till Böcking
- School of Medical Sciences, The University of New South Wales, Sydney, NSW 2052, Australia EMBL Australia Node for Single Molecule Science, The University of New South Wales, Corner Botany and High Street, Kensington Campus 2052, NSW 2052, Australia
| |
Collapse
|
157
|
Kuosmanen SM, Viitala S, Laitinen T, Peräkylä M, Pölönen P, Kansanen E, Leinonen H, Raju S, Wienecke-Baldacchino A, Närvänen A, Poso A, Heinäniemi M, Heikkinen S, Levonen AL. The Effects of Sequence Variation on Genome-wide NRF2 Binding--New Target Genes and Regulatory SNPs. Nucleic Acids Res 2016; 44:1760-75. [PMID: 26826707 PMCID: PMC4770247 DOI: 10.1093/nar/gkw052] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 01/16/2016] [Indexed: 12/11/2022] Open
Abstract
Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide binding sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity.
Collapse
Affiliation(s)
- Suvi M Kuosmanen
- Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Sari Viitala
- School of Pharmacy, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Tuomo Laitinen
- School of Pharmacy, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Mikael Peräkylä
- School of Pharmacy, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Petri Pölönen
- Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, FIN-70211 Kuopio, Finland Institute of Biomedicine, School of Medicine, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Emilia Kansanen
- Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Hanna Leinonen
- Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Suresh Raju
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | | | - Ale Närvänen
- School of Pharmacy, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Antti Poso
- School of Pharmacy, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Merja Heinäniemi
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Sami Heikkinen
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, FIN-70211 Kuopio, Finland
| | - Anna-Liisa Levonen
- Department of Biotechnology and Molecular Medicine, A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, FIN-70211 Kuopio, Finland
| |
Collapse
|
158
|
Barr CL, Misener VL. Decoding the non-coding genome: elucidating genetic risk outside the coding genome. GENES, BRAIN, AND BEHAVIOR 2016; 15:187-204. [PMID: 26515765 PMCID: PMC4833497 DOI: 10.1111/gbb.12269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 10/19/2015] [Accepted: 10/28/2015] [Indexed: 12/11/2022]
Abstract
Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders.
Collapse
Affiliation(s)
- C. L. Barr
- Toronto Western Research Institute, University Health Network, Toronto, ON, Canada
- Program in Neurosciences and Mental Health, The Hospital for Sick Children, Toronto, ON, Canada
| | - V. L. Misener
- Toronto Western Research Institute, University Health Network, Toronto, ON, Canada
| |
Collapse
|
159
|
Riley TR, Lazarovici A, Mann RS, Bussemaker HJ. Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE. eLife 2015; 4:e06397. [PMID: 26701911 PMCID: PMC4758951 DOI: 10.7554/elife.06397] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 12/20/2015] [Indexed: 01/26/2023] Open
Abstract
Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.
Collapse
Affiliation(s)
- Todd R Riley
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Systems Biology, Columbia University, New York, United States
- Department of Biology, University of Massachusetts Boston, Boston, United States
| | - Allan Lazarovici
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| | - Richard S Mann
- Department of Systems Biology, Columbia University, New York, United States
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United States
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Systems Biology, Columbia University, New York, United States
| |
Collapse
|
160
|
Hauser K, Essuman B, He Y, Coutsias E, Garcia-Diaz M, Simmerling C. A human transcription factor in search mode. Nucleic Acids Res 2015; 44:63-74. [PMID: 26673724 PMCID: PMC4705650 DOI: 10.1093/nar/gkv1091] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 10/07/2015] [Indexed: 12/14/2022] Open
Abstract
Transcription factors (TF) can change shape to bind and recognize DNA, shifting the energy landscape from a weak binding, rapid search mode to a higher affinity recognition mode. However, the mechanism(s) driving this conformational change remains unresolved and in most cases high-resolution structures of the non-specific complexes are unavailable. Here, we investigate the conformational switch of the human mitochondrial transcription termination factor MTERF1, which has a modular, superhelical topology complementary to DNA. Our goal was to characterize the details of the non-specific search mode to complement the crystal structure of the specific binding complex, providing a basis for understanding the recognition mechanism. In the specific complex, MTERF1 binds a significantly distorted and unwound DNA structure, exhibiting a protein conformation incompatible with binding to B-form DNA. In contrast, our simulations of apo MTERF1 revealed significant flexibility, sampling structures with superhelical pitch and radius complementary to the major groove of B-DNA. Docking these structures to B-DNA followed by unrestrained MD simulations led to a stable complex in which MTERF1 was observed to undergo spontaneous diffusion on the DNA. Overall, the data support an MTERF1-DNA binding and recognition mechanism driven by intrinsic dynamics of the MTERF1 superhelical topology.
Collapse
Affiliation(s)
- Kevin Hauser
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, USA
| | | | - Yiqing He
- Great Neck South High School, Great Neck, NY 11023, USA
| | - Evangelos Coutsias
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Miguel Garcia-Diaz
- Department of Pharmacological Sciences, Stony Brook University, Stony Brook, NY 11794, USA
| | - Carlos Simmerling
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
161
|
Chiu TP, Comoglio F, Zhou T, Yang L, Paro R, Rohs R. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 2015; 32:1211-3. [PMID: 26668005 PMCID: PMC4824130 DOI: 10.1093/bioinformatics/btv735] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 12/09/2015] [Indexed: 11/25/2022] Open
Abstract
Summary: DNAshapeR predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies. Availability and implementation: The DNAshapeR software package was implemented in the statistical programming language R and is freely available through the Bioconductor project at https://www.bioconductor.org/packages/devel/bioc/html/DNAshapeR.html and at the GitHub developer site, http://tsupeichiu.github.io/DNAshapeR/. Contact:rohs@usc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tsu-Pei Chiu
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Federico Comoglio
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel, Switzerland and
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Renato Paro
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel, Switzerland and Faculty of Science, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
162
|
Maeso I, Tena JJ. Favorable genomic environments for cis-regulatory evolution: A novel theoretical framework. Semin Cell Dev Biol 2015; 57:2-10. [PMID: 26673387 DOI: 10.1016/j.semcdb.2015.12.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 12/02/2015] [Accepted: 12/05/2015] [Indexed: 12/22/2022]
Abstract
Cis-regulatory changes are arguably the primary evolutionary source of animal morphological diversity. With the recent explosion of genome-wide comparisons of the cis-regulatory content in different animal species is now possible to infer general principles underlying enhancer evolution. However, these studies have also revealed numerous discrepancies and paradoxes, suggesting that the mechanistic causes and modes of cis-regulatory evolution are still not well understood and are probably much more complex than generally appreciated. Here, we argue that the mutational mechanisms and genomic regions generating new regulatory activities must comply with the constraints imposed by the molecular properties of cis-regulatory elements (CREs) and the organizational features of long-range chromatin interactions. Accordingly, we propose a new integrative evolutionary framework for cis-regulatory evolution based on two major premises for the origin of novel enhancer activity: (i) an accessible chromatin environment and (ii) compatibility with the 3D structure and interactions of pre-existing CREs. Mechanisms and DNA sequences not fulfilling these premises, will be less likely to have a measurable impact on gene expression and as such, will have a minor contribution to the evolution of gene regulation. Finally, we discuss current comparative cis-regulatory data under the light of this new evolutionary model, and propose that the two most prominent mechanisms for the evolution of cis-regulatory changes are the overprinting of ancestral CREs and the exaptation of transposable elements.
Collapse
Affiliation(s)
- Ignacio Maeso
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA), Universidad Pablo de Olavide, 41013 Seville, Spain.
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA), Universidad Pablo de Olavide, 41013 Seville, Spain.
| |
Collapse
|
163
|
Glick Y, Orenstein Y, Chen D, Avrahami D, Zor T, Shamir R, Gerber D. Integrated microfluidic approach for quantitative high-throughput measurements of transcription factor binding affinities. Nucleic Acids Res 2015; 44:e51. [PMID: 26635393 PMCID: PMC4824076 DOI: 10.1093/nar/gkv1327] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 11/14/2015] [Indexed: 01/16/2023] Open
Abstract
Protein binding to DNA is a fundamental process in gene regulation. Methodologies such as ChIP-Seq and mapping of DNase I hypersensitive sites provide global information on this regulation in vivo In vitro methodologies provide valuable complementary information on protein-DNA specificities. However, current methods still do not measure absolute binding affinities. There is a real need for large-scale quantitative protein-DNA affinity measurements. We developed QPID, a microfluidic application for measuring protein-DNA affinities. A single run is equivalent to 4096 gel-shift experiments. Using QPID, we characterized the different affinities of ATF1, c-Jun, c-Fos and AP-1 to the CRE consensus motif and CRE half-site in two different genomic sequences on a single device. We discovered that binding of ATF1, but not of AP-1, to the CRE half-site is highly affected by its genomic context. This effect was highly correlated with ATF1 ChIP-seq and PBM experiments. Next, we characterized the affinities of ATF1 and ATF3 to 128 genomic CRE and CRE half-site sequences. Our affinity measurements explained that in vivo binding differences between ATF1 and ATF3 to CRE and CRE half-sites are partially mediated by differences in the minor groove width. We believe that QPID would become a central tool for quantitative characterization of biophysical aspects affecting protein-DNA binding.
Collapse
Affiliation(s)
- Yair Glick
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Dana Chen
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Dorit Avrahami
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Tsaffrir Zor
- Department of Biochemistry & Molecular Biology, Life Sciences Institute, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Doron Gerber
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
164
|
Yang C, Chang CH. Exploring comprehensive within-motif dependence of transcription factor binding in Escherichia coli. Sci Rep 2015; 5:17021. [PMID: 26592556 PMCID: PMC4655474 DOI: 10.1038/srep17021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/16/2015] [Indexed: 01/18/2023] Open
Abstract
Modeling the binding of transcription factors helps to decipher the control logic behind transcriptional regulatory networks. Position weight matrix is commonly used to describe a binding motif but assumes statistical independence between positions. Although current approaches take within-motif dependence into account for better predictive performance, these models usually rely on prior knowledge and incorporate simple positional dependence to describe binding motifs. The inability to take complex within-motif dependence into account may result in an incomplete representation of binding motifs. In this work, we applied association rule mining techniques and constructed models to explore within-motif dependence for transcription factors in Escherichia coli. Our models can reflect transcription factor-DNA recognition where the explored dependence correlates with the binding specificity. We also propose a graphical representation of the explored within-motif dependence to illustrate the final binding configurations. Understanding the binding configurations also enables us to fine-tune or design transcription factor binding sites, and we attempt to present the configurations through exploring within-motif dependence.
Collapse
Affiliation(s)
- Chi Yang
- Institute of Biomedical Informatics, National Yang Ming University, Taipei, 11221, Taiwan
| | - Chuan-Hsiung Chang
- Institute of Biomedical Informatics, National Yang Ming University, Taipei, 11221, Taiwan.,Center for Systems and Synthetic Biology, National Yang Ming University, Taipei, 11221, Taiwan
| |
Collapse
|
165
|
Liu L, Zhao W, Zhou X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res 2015; 44:e49. [PMID: 26590261 PMCID: PMC4797273 DOI: 10.1093/nar/gkv1281] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 11/04/2015] [Indexed: 12/11/2022] Open
Abstract
Regulation of gene expression requires both transcription factor (TFs) and epigenetic modifications, and interplays between the two types of factors have been discovered. However study of relationships between chromatin features and TF–TF co-occupancy remains limited. Here, we revealed the relationship by first illustrating distinct profile patterns of chromatin features related to different binding events, including single TF binding and TF–TF co-occupancy of 71 TFs from five human cell lines. We further implemented statistical analyses to demonstrate the relationship by accurately predicting co-occupancy genome-widely using chromatin features including DNase I hypersensitivity, 11 histone modifications (HMs) and GC content. Remarkably, our results showed that the combination of chromatin features enables accurate predictions across the five cells. For individual chromatin features, DNase I enables high and consistent predictions. H3K27ac, H3K4me 2, H3K4me3 and H3K9ac are more reliable predictors than other HMs. Although the combination of 11 HMs achieves accurate predictions, their predictive ability varies considerably when a model obtained from one cell is applied to others, indicating relationship between HMs and TF–TF co-occupancy is cell type dependent. GC content is not a reliable predictor, but the addition of GC content to any other features enhances their predictive ability. Together, our results elucidate a strong relationship between TF–TF co-occupancy and chromatin features.
Collapse
Affiliation(s)
- Liang Liu
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Weiling Zhao
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Xiaobo Zhou
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| |
Collapse
|
166
|
Nadel J, Athanasiadou R, Lemetre C, Wijetunga NA, Ó Broin P, Sato H, Zhang Z, Jeddeloh J, Montagna C, Golden A, Seoighe C, Greally JM. RNA:DNA hybrids in the human genome have distinctive nucleotide characteristics, chromatin composition, and transcriptional relationships. Epigenetics Chromatin 2015; 8:46. [PMID: 26579211 PMCID: PMC4647656 DOI: 10.1186/s13072-015-0040-6] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/29/2015] [Indexed: 01/01/2023] Open
Abstract
Background RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions. Results We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation. Conclusions Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures intrans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0040-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julie Nadel
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Rodoniki Athanasiadou
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Place, New York, NY 10003 USA
| | - Christophe Lemetre
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Integrated Genomics Operation, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA
| | - N Ari Wijetunga
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Pilib Ó Broin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Hanae Sato
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | | | - Cristina Montagna
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Aaron Golden
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - John M Greally
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Genetics, Center for Epigenomics and Division of Computational Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, NY 10461 USA
| |
Collapse
|
167
|
Mondal A, Bhattacherjee A. Searching target sites on DNA by proteins: Role of DNA dynamics under confinement. Nucleic Acids Res 2015; 43:9176-86. [PMID: 26400158 PMCID: PMC4627088 DOI: 10.1093/nar/gkv931] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 08/15/2015] [Accepted: 09/07/2015] [Indexed: 02/07/2023] Open
Abstract
DNA-binding proteins (DBPs) rapidly search and specifically bind to their target sites on genomic DNA in order to trigger many cellular regulatory processes. It has been suggested that the facilitation of search dynamics is achieved by combining 3D diffusion with one-dimensional sliding and hopping dynamics of interacting proteins. Although, recent studies have advanced the knowledge of molecular determinants that affect one-dimensional search efficiency, the role of DNA molecule is poorly understood. In this study, by using coarse-grained simulations, we propose that dynamics of DNA molecule and its degree of confinement due to cellular crowding concertedly regulate its groove geometry and modulate the inter-communication with DBPs. Under weak confinement, DNA dynamics promotes many short, rotation-decoupled sliding events interspersed by hopping dynamics. While this results in faster 1D diffusion, associated probability of missing targets by jumping over them increases. In contrast, strong confinement favours rotation-coupled sliding to locate targets but lacks structural flexibility to achieve desired specificity. By testing under physiological crowding, our study provides a plausible mechanism on how DNA molecule may help in maintaining an optimal balance between fast hopping and rotation-coupled sliding dynamics, to locate target sites rapidly and form specific complexes precisely.
Collapse
Affiliation(s)
- Anupam Mondal
- Center for Computational Biology, Indraprastha Institute of Information Technology (IIIT) Delhi, New Delhi-110020, India
| | - Arnab Bhattacherjee
- Center for Computational Biology, Indraprastha Institute of Information Technology (IIIT) Delhi, New Delhi-110020, India
| |
Collapse
|
168
|
Jain D, Narayanan N, Nair DT. Plasticity in Repressor-DNA Interactions Neutralizes Loss of Symmetry in Bipartite Operators. J Biol Chem 2015; 291:1235-42. [PMID: 26511320 DOI: 10.1074/jbc.m115.689695] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Indexed: 11/06/2022] Open
Abstract
Transcription factor-DNA interactions are central to gene regulation. Many transcription factors regulate multiple target genes and can bind sequences that do not conform strictly to the consensus. To understand the structural mechanism utilized by the transcription regulators to bind diverse target sequences, we have employed the repressor AraR from Bacillus subtilis as a model system. AraR is known to bind to eight different operator sites in the bacterial genome. Although there are differences in the sequences of four of these operators, ORE1, ORX1, ORA1, and ORR3, the AraR-DNA binding domain (AraR-DBD) as well as full-length AraR unexpectedly binds to each of these sequences with similar affinities as measured by fluorescence anisotropy experiments. We have determined crystal structures of AraR-DBD in complex with two different natural operators ORE1 and ORX1 up to 2.07 and 1.97 Å resolution, respectively. These structures were compared with the previously reported structures of AraR-DBD bound to two other natural operators (ORA1 and ORR3). Interactions of two molecules of AraR-DBD with the symmetric operator, ORE1, are identical, but their interaction with the non-symmetric operator ORX1 results in breakdown of the symmetry in protein-DNA interactions. The novel interactions observed are accompanied by local conformational change in the DNA. ChIP-sequencing (ChIP-Seq) data on other transcription factors has shown that they can bind to diverse targets, and hence the plasticity exhibited by AraR may be a general phenomenon. The ability of transcription factors to form alternate interactions may be important for employment in new functions and evolution of novel regulatory circuits.
Collapse
Affiliation(s)
- Deepti Jain
- From the Transcription Regulation Lab and the National Centre for Biological Sciences (NCBS-TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560065, and
| | - Naveen Narayanan
- the National Centre for Biological Sciences (NCBS-TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560065, and the Genomic Integrity and Plasticity Lab, Regional Centre for Biotechnology, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Bhankri Village, Faridabad 121001, Manipal University, Manipal, 576104 Karnataka, India
| | - Deepak T Nair
- the National Centre for Biological Sciences (NCBS-TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560065, and the Genomic Integrity and Plasticity Lab, Regional Centre for Biotechnology, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Bhankri Village, Faridabad 121001
| |
Collapse
|
169
|
ChEC-seq kinetics discriminates transcription factor binding sites by DNA sequence and shape in vivo. Nat Commun 2015; 6:8733. [PMID: 26490019 PMCID: PMC4618392 DOI: 10.1038/ncomms9733] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/25/2015] [Indexed: 12/31/2022] Open
Abstract
Chromatin endogenous cleavage (ChEC) uses fusion of a protein of interest to micrococcal nuclease (MNase) to target calcium-dependent cleavage to specific genomic loci in vivo. Here we report the combination of ChEC with high-throughput sequencing (ChEC-seq) to map budding yeast transcription factor (TF) binding. Temporal analysis of ChEC-seq data reveals two classes of sites for TFs, one displaying rapid cleavage at sites with robust consensus motifs and the second showing slow cleavage at largely unique sites with low-scoring motifs. Sites with high-scoring motifs also display asymmetric cleavage, indicating that ChEC-seq provides information on the directionality of TF-DNA interactions. Strikingly, similar DNA shape patterns are observed regardless of motif strength, indicating that the kinetics of ChEC-seq discriminates DNA recognition through sequence and/or shape. We propose that time-resolved ChEC-seq detects both high-affinity interactions of TFs with consensus motifs and sites preferentially sampled by TFs during diffusion and sliding. In chromatin endogenous cleavage (ChEC), micrococcal nuclease (MNase) is fused to a protein of interest and its cleavage is thus targeted to specific genomic loci in vivo. Here, the authors show that time-resolved ChEC-seq (high-throughput sequencing after ChEC) can detect DNA shape patterns regardless of motif strength.
Collapse
|
170
|
Keilwagen J, Grau J. Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res 2015; 43:e119. [PMID: 26116565 PMCID: PMC4605289 DOI: 10.1093/nar/gkv577] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Revised: 05/11/2015] [Accepted: 05/21/2015] [Indexed: 11/17/2022] Open
Abstract
Binding of transcription factors to DNA is one of the keystones of gene regulation. The existence of statistical dependencies between binding site positions is widely accepted, while their relevance for computational predictions has been debated. Building probabilistic models of binding sites that may capture dependencies is still challenging, since the most successful motif discovery approaches require numerical optimization techniques, which are not suited for selecting dependency structures. To overcome this issue, we propose sparse local inhomogeneous mixture (Slim) models that combine putative dependency structures in a weighted manner allowing for numerical optimization of dependency structure and model parameters simultaneously. We find that Slim models yield a substantially better prediction performance than previous models on genomic context protein binding microarray data sets and on ChIP-seq data sets. To elucidate the reasons for the improved performance, we develop dependency logos, which allow for visual inspection of dependency structures within binding sites. We find that the dependency structures discovered by Slim models are highly diverse and highly transcription factor-specific, which emphasizes the need for flexible dependency models. The observed dependency structures range from broad heterogeneities to sparse dependencies between neighboring and non-neighboring binding site positions.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06099 Halle (Saale), Germany
| |
Collapse
|
171
|
Elmas A, Wang X, Samoilov MS. Reconstruction of novel transcription factor regulons through inference of their binding sites. BMC Bioinformatics 2015; 16:299. [PMID: 26388177 PMCID: PMC4576408 DOI: 10.1186/s12859-015-0685-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2015] [Accepted: 07/24/2015] [Indexed: 02/04/2023] Open
Abstract
Background In most sequenced organisms the number of known regulatory genes (e.g., transcription factors (TFs)) vastly exceeds the number of experimentally-verified regulons that could be associated with them. At present, identification of TF regulons is mostly done through comparative genomics approaches. Such methods could miss organism-specific regulatory interactions and often require expensive and time-consuming experimental techniques to generate the underlying data. Results In this work, we present an efficient algorithm that aims to identify a given transcription factor’s regulon through inference of its unknown binding sites, based on the discovery of its binding motif. The proposed approach relies on computational methods that utilize gene expression data sets and knockout fitness data sets which are available or may be straightforwardly obtained for many organisms. We computationally constructed the profiles of putative regulons for the TFs LexA, PurR and Fur in E. coli K12 and identified their binding motifs. Comparisons with an experimentally-verified database showed high recovery rates of the known regulon members, and indicated good predictions for the newly found genes with high biological significance. The proposed approach is also applicable to novel organisms for predicting unknown regulons of the transcriptional regulators. Results for the hypothetical protein Dde0289 in D. alaskensis include the discovery of a Fis-type TF binding motif. Conclusions The proposed motif-based regulon inference approach can discover the organism-specific regulatory interactions on a single genome, which may be missed by current comparative genomics techniques due to their limitations. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0685-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Abdulkadir Elmas
- Department of Electrical Engineering, Columbia University, 500W 120th Street, New York, 10027, NY, USA.
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, 500W 120th Street, New York, 10027, NY, USA.
| | - Michael S Samoilov
- Department of Bioengineering, QB3 California Institute for Quantitative Biosciences UC Berkeley, 1700 4th St #214, Berkeley, 94720, California, USA.
| |
Collapse
|
172
|
Affiliation(s)
- Gary D. Stormo
- Washington University School of Medicine St. Louis Missouri
| |
Collapse
|
173
|
Rohs R, Machado ACD, Yang L. Exposing the secrets of sex determination. Nat Struct Mol Biol 2015; 22:437-8. [PMID: 26036567 DOI: 10.1038/nsmb.3042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Remo Rohs
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Ana Carolina Dantas Machado
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
174
|
Motif signatures in stretch enhancers are enriched for disease-associated genetic variants. Epigenetics Chromatin 2015; 8:23. [PMID: 26180553 PMCID: PMC4502539 DOI: 10.1186/s13072-015-0015-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 07/01/2015] [Indexed: 02/07/2023] Open
Abstract
Background Stretch enhancers (SEs) are large chromatin-defined regulatory elements that are at least 3,000 base pairs (bps) long, in contrast to the median enhancer length of 800 bps. SEs tend to be cell-type specific, regulate cell-type specific gene expression, and are enriched in disease-associated genetic variants in disease-relevant cell types. Transcription factors (TFs) can bind to enhancers to modulate enhancer activity, and their sequence specificity can be represented by motifs. We hypothesize motifs can provide a biological context for how genetic variants contribute to disease. Results We integrated chromatin state, gene expression, and chromatin accessibility [measured as DNase I Hypersensitive Sites (DHSs)] maps across nine different cell types. Motif enrichment analyses of chromatin-defined enhancer sequences identify several known cell-type specific “master” factors. Furthermore, de novo motif discovery not only recovers many of these motifs, but also identifies novel non-canonical motifs, providing additional insight into TF binding preferences. Across the length of SEs, motifs are most enriched in DHSs, though relative enrichment is also observed outside of DHSs. Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs. Conclusions These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin. Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0015-7) contains supplementary material, which is available to authorized users.
Collapse
|
175
|
Dror I, Golan T, Levy C, Rohs R, Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res 2015; 25:1268-80. [PMID: 26160164 PMCID: PMC4561487 DOI: 10.1101/gr.184671.114] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Accepted: 07/08/2015] [Indexed: 12/12/2022]
Abstract
Transcriptional regulation requires the binding of transcription factors (TFs) to short sequence-specific DNA motifs, usually located at the gene regulatory regions. Interestingly, based on a vast amount of data accumulated from genomic assays, it has been shown that only a small fraction of all potential binding sites containing the consensus motif of a given TF actually bind the protein. Recent in vitro binding assays, which exclude the effects of the cellular environment, also demonstrate selective TF binding. An intriguing conjecture is that the surroundings of cognate binding sites have unique characteristics that distinguish them from other sequences containing a similar motif that are not bound by the TF. To test this hypothesis, we conducted a comprehensive analysis of the sequence and DNA shape features surrounding the core-binding sites of 239 and 56 TFs extracted from in vitro HT-SELEX binding assays and in vivo ChIP-seq data, respectively. Comparing the nucleotide content of the regions around the TF-bound sites to the counterpart unbound regions containing the same consensus motifs revealed significant differences that extend far beyond the core-binding site. Specifically, the environment of the bound motifs demonstrated unique sequence compositions, DNA shape features, and overall high similarity to the core-binding motif. Notably, the regions around the binding sites of TFs that belong to the same TF families exhibited similar features, with high agreement between the in vitro and in vivo data sets. We propose that these unique features assist in guiding TFs to their cognate binding sites.
Collapse
Affiliation(s)
- Iris Dror
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel; Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Tamar Golan
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Carmit Levy
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
176
|
|
177
|
Abe N, Dror I, Yang L, Slattery M, Zhou T, Bussemaker HJ, Rohs R, Mann RS. Deconvolving the recognition of DNA shape from sequence. Cell 2015; 161:307-18. [PMID: 25843630 DOI: 10.1016/j.cell.2015.02.008] [Citation(s) in RCA: 143] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Revised: 12/08/2014] [Accepted: 01/26/2015] [Indexed: 01/25/2023]
Abstract
Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the 3D shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we tease them apart in the context of Hox-DNA binding by mutating residues that, in a co-crystal structure, only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features. Introducing shape-recognizing residues from one Hox protein to another swapped binding specificities in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features to a model that only depends on sequence, and feature selection identified shape features important for recognition. Thus, shape readout is a direct and independent component of binding site selection by Hox proteins.
Collapse
Affiliation(s)
- Namiko Abe
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Iris Dror
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Biology, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Lin Yang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew Slattery
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth, MN 55812, USA
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10032, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA; Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA.
| |
Collapse
|