1
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coli. Nat Commun 2025; 16:4255. [PMID: 40335485 PMCID: PMC12059191 DOI: 10.1038/s41467-025-58862-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 04/03/2025] [Indexed: 05/09/2025] Open
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We use these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We use BoltzNet to quantitatively design novel binding sites, which we validate with biophysical experiments on purified protein. We generate models for 124 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México, México, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México, México
| | - Víctor H Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA.
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA, USA.
| |
Collapse
|
2
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.594371. [PMID: 38826350 PMCID: PMC11142182 DOI: 10.1101/2024.05.23.594371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México 14610, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México
| | - Víctor H. Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA 02215
| |
Collapse
|
3
|
Romanov SE, Kalashnikova DA, Laktionov PP. Methods of massive parallel reporter assays for investigation of enhancers. Vavilovskii Zhurnal Genet Selektsii 2021; 25:344-355. [PMID: 34901731 PMCID: PMC8627875 DOI: 10.18699/vj21.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/28/2021] [Accepted: 03/28/2021] [Indexed: 11/19/2022] Open
Abstract
The correct deployment of genetic programs for development and differentiation relies on finely coordinated regulation of specific gene sets. Genomic regulatory elements play an exceptional role in this process. There are few types of gene regulatory elements, including promoters, enhancers, insulators and silencers. Alterations of gene regulatory elements may cause various pathologies, including cancer, congenital disorders and autoimmune diseases. The development of high-throughput genomic assays has made it possible to significantly accelerate the accumulation of information about the characteristic epigenetic properties of regulatory elements. In combination with high-throughput studies focused on the genome-wide distribution of epigenetic marks, regulatory proteins and the spatial structure of chromatin, this significantly expands the understanding of the principles of epigenetic regulation of genes and allows potential regulatory elements to be searched for in silico. However, common experimental approaches used to study the local characteristics of chromatin have a number of technical limitations that may reduce the reliability of computational identification of genomic regulatory sequences. Taking into account the variability of the functions of epigenetic determinants and complex multicomponent regulation of genomic elements activity, their functional verification is often required. A plethora of methods have been developed to study the functional role of regulatory elements on the genome scale. Common experimental approaches for in silico identification of regulatory elements and their inherent technical limitations will be described. The present review is focused on original high-throughput methods of enhancer activity reporter analysis that are currently used to validate predicted regulatory elements and to perform de novo searches. The methods described allow assessing the functional role of the nucleotide sequence of a regulatory element, to determine its exact boundaries and to assess the influence of the local state of chromatin on the activity of enhancers and gene expression. These approaches have contributed substantially to the understanding of the fundamental principles of gene regulation.
Collapse
Affiliation(s)
- S E Romanov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - D A Kalashnikova
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - P P Laktionov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| |
Collapse
|
4
|
Genome and sequence determinants governing the expression of horizontally acquired DNA in bacteria. ISME JOURNAL 2020; 14:2347-2357. [PMID: 32514119 PMCID: PMC7608860 DOI: 10.1038/s41396-020-0696-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 05/22/2020] [Accepted: 05/28/2020] [Indexed: 01/23/2023]
Abstract
While horizontal gene transfer is prevalent across the biosphere, the regulatory features that enable expression and functionalization of foreign DNA remain poorly understood. Here, we combine high-throughput promoter activity measurements and large-scale genomic analysis of regulatory regions to investigate the cross-compatibility of regulatory elements (REs) in bacteria. Functional characterization of thousands of natural REs in three distinct bacterial species revealed distinct expression patterns according to RE and recipient phylogeny. Host capacity to activate foreign promoters was proportional to their genomic GC content, while many low GC regulatory elements were both broadly active and had more transcription start sites across hosts. The difference in expression capabilities could be explained by the influence of the host GC content on the stringency of the AT-rich canonical σ70 motif necessary for transcription initiation. We further confirm the generalizability of this model and find widespread GC content adaptation of the σ70 motif in a set of 1,545 genomes from all major bacterial phyla. Our analysis identifies a key mechanism by which the strength of the AT-rich σ70 motif relative to a host's genomic GC content governs the capacity for expression of acquired DNA. These findings shed light on regulatory adaptation in the context of evolving genomic composition.
Collapse
|
5
|
Ryan GE, Farley EK. Functional genomic approaches to elucidate the role of enhancers during development. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2020; 12:e1467. [PMID: 31808313 PMCID: PMC7027484 DOI: 10.1002/wsbm.1467] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 10/02/2019] [Accepted: 10/11/2019] [Indexed: 12/22/2022]
Abstract
Successful development depends on the precise tissue-specific regulation of genes by enhancers, genetic elements that act as switches to control when and where genes are expressed. Because enhancers are critical for development, and the majority of disease-associated mutations reside within enhancers, it is essential to understand which sequences within enhancers are important for function. Advances in sequencing technology have enabled the rapid generation of genomic data that predict putative active enhancers, but functionally validating these sequences at scale remains a fundamental challenge. Herein, we discuss the power of genome-wide strategies used to identify candidate enhancers, and also highlight limitations and misconceptions that have arisen from these data. We discuss the use of massively parallel reporter assays to test enhancers for function at scale. We also review recent advances in our ability to study gene regulation during development, including CRISPR-based tools to manipulate genomes and single-cell transcriptomics to finely map gene expression. Finally, we look ahead to a synthesis of complementary genomic approaches that will advance our understanding of enhancer function during development. This article is categorized under: Physiology > Mammalian Physiology in Health and Disease Developmental Biology > Developmental Processes in Health and Disease Laboratory Methods and Technologies > Genetic/Genomic Methods.
Collapse
Affiliation(s)
- Genevieve E. Ryan
- Department of MedicineUniversity of CaliforniaSan DiegoCalifornia
- Division of Biological Sciences, Department of MedicineUniversity of CaliforniaSan DiegoCalifornia
| | - Emma K. Farley
- Department of MedicineUniversity of CaliforniaSan DiegoCalifornia
- Division of Biological Sciences, Department of MedicineUniversity of CaliforniaSan DiegoCalifornia
| |
Collapse
|
6
|
Martini MC, Zhou Y, Sun H, Shell SS. Defining the Transcriptional and Post-transcriptional Landscapes of Mycobacterium smegmatis in Aerobic Growth and Hypoxia. Front Microbiol 2019; 10:591. [PMID: 30984135 PMCID: PMC6448022 DOI: 10.3389/fmicb.2019.00591] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/08/2019] [Indexed: 12/13/2022] Open
Abstract
The ability of Mycobacterium tuberculosis to infect, proliferate, and survive during long periods in the human lungs largely depends on the rigorous control of gene expression. Transcriptome-wide analyses are key to understanding gene regulation on a global scale. Here, we combine 5′-end-directed libraries with RNAseq expression libraries to gain insight into the transcriptome organization and post-transcriptional mRNA cleavage landscape in mycobacteria during log phase growth and under hypoxia, a physiologically relevant stress condition. Using the model organism Mycobacterium smegmatis, we identified 6,090 transcription start sites (TSSs) with high confidence during log phase growth, of which 67% were categorized as primary TSSs for annotated genes, and the remaining were classified as internal, antisense, or orphan, according to their genomic context. Interestingly, over 25% of the RNA transcripts lack a leader sequence, and of the coding sequences that do have leaders, 53% lack a strong consensus Shine-Dalgarno site. This indicates that like M. tuberculosis, M. smegmatis can initiate translation through multiple mechanisms. Our approach also allowed us to identify over 3,000 RNA cleavage sites, which occur at a novel sequence motif. To our knowledge, this represents the first report of a transcriptome-wide RNA cleavage site map in mycobacteria. The cleavage sites show a positional bias toward mRNA regulatory regions, highlighting the importance of post-transcriptional regulation in gene expression. We show that in low oxygen, a condition associated with the host environment during infection, mycobacteria change their transcriptomic profiles and endonucleolytic RNA cleavage is markedly reduced, suggesting a mechanistic explanation for previous reports of increased mRNA half-lives in response to stress. In addition, a number of TSSs were triggered in hypoxia, 56 of which contain the binding motif for the sigma factor SigF in their promoter regions. This suggests that SigF makes direct contributions to transcriptomic remodeling in hypoxia-challenged mycobacteria. Taken together, our data provide a foundation for further study of both transcriptional and posttranscriptional regulation in mycobacteria.
Collapse
Affiliation(s)
- M Carla Martini
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA, United States
| | - Ying Zhou
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA, United States
| | - Huaming Sun
- Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA, United States
| | - Scarlet S Shell
- Department of Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA, United States.,Program in Bioinformatics and Computational Biology, Worcester Polytechnic Institute, Worcester, MA, United States
| |
Collapse
|
7
|
Santos-Zavaleta A, Sánchez-Pérez M, Salgado H, Velázquez-Ramírez DA, Gama-Castro S, Tierrafría VH, Busby SJW, Aquino P, Fang X, Palsson BO, Galagan JE, Collado-Vides J. A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0. BMC Biol 2018; 16:91. [PMID: 30115066 PMCID: PMC6094552 DOI: 10.1186/s12915-018-0555-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 07/25/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Our understanding of the regulation of gene expression has benefited from the availability of high-throughput technologies that interrogate the whole genome for the binding of specific transcription factors and gene expression profiles. In the case of widely used model organisms, such as Escherichia coli K-12, the new knowledge gained from these approaches needs to be integrated with the legacy of accumulated knowledge from genetic and molecular biology experiments conducted in the pre-genomic era in order to attain the deepest level of understanding possible based on the available data. RESULTS In this paper, we describe an expansion of RegulonDB, the database containing the rich legacy of decades of classic molecular biology experiments supporting what we know about gene regulation and operon organization in E. coli K-12, to include the genome-wide dataset collections from 32 ChIP and 19 gSELEX publications, in addition to around 60 genome-wide expression profiles relevant to the functional significance of these datasets and used in their curation. Three essential features for the integration of this information coming from different methodological approaches are: first, a controlled vocabulary within an ontology for precisely defining growth conditions; second, the criteria to separate elements with enough evidence to consider them involved in gene regulation from isolated transcription factor binding sites without such support; and third, an expanded computational model supporting this knowledge. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration needed to manage and access such wealth of knowledge. CONCLUSIONS This version 10.0 of RegulonDB is a first step toward what should become the unifying access point for current and future knowledge on gene regulation in E. coli K-12. Furthermore, this model platform and associated methodologies and criteria can be emulated for gathering knowledge on other microbial organisms.
Collapse
Affiliation(s)
- Alberto Santos-Zavaleta
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
| | - Mishael Sánchez-Pérez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
| | - Heladia Salgado
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
| | | | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
| | - Víctor H. Tierrafría
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
| | | | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts USA
| | - Xin Fang
- Department of Bioengineering, University of California San Diego, La Jolla, California USA
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California USA
- Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts USA
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos México
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts USA
| |
Collapse
|
8
|
Abstract
CCCTC-binding factor (CTCF) is a conserved, essential regulator of chromatin architecture containing a unique array of 11 zinc fingers (ZFs). Gene duplication and sequence divergence during early amniote evolution generated the CTCF paralog Brother Of the Regulator of Imprinted Sites (BORIS), which has a DNA binding specificity identical to that of CTCF but divergent N- and C-termini. While healthy somatic tissues express only CTCF, CTCF and BORIS are normally co-expressed in meiotic and post-meiotic germ cells, and aberrant activation of BORIS occurs in tumors and some cancer cell lines. This has led to a model in which CTCF and BORIS compete for binding to some but not all genomic target sites; however, regulation of CTCF and BORIS genomic co-occupancy is not well understood. We recently addressed this issue, finding evidence for two major classes of CTCF target sequences, some of which contain single CTCF target sites (1xCTSes) and others containing two adjacent CTCF motifs (2xCTSes). The functional and chromatin structural features of 2xCTSes are distinct from those of 1xCTS-containing regions bound by a CTCF monomer. We suggest that these previously overlooked classes of CTCF binding regions may have different roles in regulating diverse chromatin-based phenomena, and may impact our understanding of heritable epigenetic regulation in cancer cells and normal germ cells.
Collapse
Affiliation(s)
- Victor V Lobanenkov
- a Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health , 5601 Fishers Ln, Rockville , MD , USA
| | - Gabriel E Zentner
- b Department of Biology , Indiana University , 915 E 3rd St, Bloomington , IN 47405 , USA
| |
Collapse
|
9
|
Stanton KP, Jin J, Lederman RR, Weissman SM, Kluger Y. Ritornello: high fidelity control-free chromatin immunoprecipitation peak calling. Nucleic Acids Res 2017; 45:e173. [PMID: 28981893 PMCID: PMC5716106 DOI: 10.1093/nar/gkx799] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 08/30/2017] [Indexed: 02/03/2023] Open
Abstract
With the advent of next generation high-throughput DNA sequencing technologies, omics experiments have become the mainstay for studying diverse biological effects on a genome wide scale. Chromatin immunoprecipitation (ChIP-seq) is the omics technique that enables genome wide localization of transcription factor (TF) binding or epigenetic modification events. Since the inception of ChIP-seq in 2007, many methods have been developed to infer ChIP-target binding loci from the resultant reads after mapping them to a reference genome. However, interpreting these data has proven challenging, and as such these algorithms have several shortcomings, including susceptibility to false positives due to artifactual peaks, poor localization of binding sites and the requirement for a total DNA input control which increases the cost of performing these experiments. We present Ritornello, a new approach for finding TF-binding sites in ChIP-seq, with roots in digital signal processing that addresses all of these problems. We show that Ritornello generally performs equally or better than the peak callers tested and recommended by the ENCODE consortium, but in contrast, Ritornello does not require a matched total DNA input control to avoid false positives, effectively decreasing the sequencing cost to perform ChIP-seq. Ritornello is freely available at https://github.com/KlugerLab/Ritornello.
Collapse
Affiliation(s)
- Kelly P Stanton
- Department of Pathology, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jiaqi Jin
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | - Roy R Lederman
- Program of Applied Mathematics, Yale University, 51 Prospect Street, New Haven, CT 06511, USA.,Department of Mathematics and PACM, Princeton University, Fine Hall, Washington Road, Princeton, NJ 08544-1000, USA
| | - Sherman M Weissman
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | - Yuval Kluger
- Department of Pathology, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA.,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.,Program of Applied Mathematics, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| |
Collapse
|
10
|
Aquino P, Honda B, Jaini S, Lyubetskaya A, Hosur K, Chiu JG, Ekladious I, Hu D, Jin L, Sayeg MK, Stettner AI, Wang J, Wong BG, Wong WS, Alexander SL, Ba C, Bensussen SI, Bernstein DB, Braff D, Cha S, Cheng DI, Cho JH, Chou K, Chuang J, Gastler DE, Grasso DJ, Greifenberger JS, Guo C, Hawes AK, Israni DV, Jain SR, Kim J, Lei J, Li H, Li D, Li Q, Mancuso CP, Mao N, Masud SF, Meisel CL, Mi J, Nykyforchyn CS, Park M, Peterson HM, Ramirez AK, Reynolds DS, Rim NG, Saffie JC, Su H, Su WR, Su Y, Sun M, Thommes MM, Tu T, Varongchayakul N, Wagner TE, Weinberg BH, Yang R, Yaroslavsky A, Yoon C, Zhao Y, Zollinger AJ, Stringer AM, Foster JW, Wade J, Raman S, Broude N, Wong WW, Galagan JE. Coordinated regulation of acid resistance in Escherichia coli. BMC SYSTEMS BIOLOGY 2017; 11:1. [PMID: 28061857 PMCID: PMC5217608 DOI: 10.1186/s12918-016-0376-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 12/07/2016] [Indexed: 12/29/2022]
Abstract
Background Enteric Escherichia coli survives the highly acidic environment of the stomach through multiple acid resistance (AR) mechanisms. The most effective system, AR2, decarboxylates externally-derived glutamate to remove cytoplasmic protons and excrete GABA. The first described system, AR1, does not require an external amino acid. Its mechanism has not been determined. The regulation of the multiple AR systems and their coordination with broader cellular metabolism has not been fully explored. Results We utilized a combination of ChIP-Seq and gene expression analysis to experimentally map the regulatory interactions of four TFs: nac, ntrC, ompR, and csiR. Our data identified all previously in vivo confirmed direct interactions and revealed several others previously inferred from gene expression data. Our data demonstrate that nac and csiR directly modulate AR, and leads to a regulatory network model in which all four TFs participate in coordinating acid resistance, glutamate metabolism, and nitrogen metabolism. This model predicts a novel mechanism for AR1 by which the decarboxylation enzymes of AR2 are used with internally derived glutamate. This hypothesis makes several testable predictions that we confirmed experimentally. Conclusions Our data suggest that the regulatory network underlying AR is complex and deeply interconnected with the regulation of GABA and glutamate metabolism, nitrogen metabolism. These connections underlie and experimentally validated model of AR1 in which the decarboxylation enzymes of AR2 are used with internally derived glutamate. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0376-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patricia Aquino
- Department of Biomedical Engineering, Boston University, Boston, USA.,BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Brent Honda
- Department of Biomedical Engineering, Boston University, Boston, USA
| | - Suma Jaini
- Department of Biomedical Engineering, Boston University, Boston, USA
| | | | - Krutika Hosur
- Department of Biomedical Engineering, Boston University, Boston, USA.,BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Joanna G Chiu
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Iriny Ekladious
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Dongjian Hu
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Lin Jin
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Marianna K Sayeg
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Arion I Stettner
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Julia Wang
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Brandon G Wong
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Winnie S Wong
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Cong Ba
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Seth I Bensussen
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - David B Bernstein
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Dana Braff
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Susie Cha
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Daniel I Cheng
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Jang Hwan Cho
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Kenny Chou
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - James Chuang
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Daniel E Gastler
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Daniel J Grasso
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Chen Guo
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Anna K Hawes
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Divya V Israni
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Saloni R Jain
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Jessica Kim
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Junyu Lei
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Hao Li
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - David Li
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Qian Li
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Ning Mao
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Salwa F Masud
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Cari L Meisel
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Jing Mi
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Minhee Park
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Hannah M Peterson
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Alfred K Ramirez
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Daniel S Reynolds
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Nae Gyune Rim
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Jared C Saffie
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Hang Su
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Wendell R Su
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Yaqing Su
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Meng Sun
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Meghan M Thommes
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Tao Tu
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Tyler E Wagner
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Rouhui Yang
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Christine Yoon
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | - Yanyu Zhao
- BE605 Course, Biomedical Engineering, Boston University, Boston, USA
| | | | - Anne M Stringer
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - John W Foster
- Department of Microbiology and Immunology, University of South Alabama College of Medicine, Mobile, AL, 36688, USA
| | - Joseph Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA.,Department of Biomedical Sciences, University at Albany, Albany, NY, USA
| | - Sahadaven Raman
- Department of Microbiology and Immunology, University of South Alabama College of Medicine, Mobile, AL, 36688, USA
| | - Natasha Broude
- Department of Biomedical Engineering, Boston University, Boston, USA
| | - Wilson W Wong
- Department of Biomedical Engineering, Boston University, Boston, USA
| | - James E Galagan
- Department of Biomedical Engineering, Boston University, Boston, USA. .,Bioinformatics program, Boston University, Boston, USA. .,National Emerging Infectious Diseases Laboratory, Boston University, Boston, USA.
| |
Collapse
|
11
|
Nettling M, Treutler H, Cerquides J, Grosse I. Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information. BMC Genomics 2016; 17:347. [PMID: 27165633 PMCID: PMC4862171 DOI: 10.1186/s12864-016-2682-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/28/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. ChIP-seq has become the major technology to uncover genomic regions containing those binding sites, but motifs predicted by traditional computational approaches using these data are distorted by a ubiquitous binding-affinity bias. Here, we present an approach for detecting and correcting this bias using inter-species information. RESULTS We find that the binding-affinity bias caused by the ChIP-seq experiment in the reference species is stronger than the indirect binding-affinity bias in orthologous regions from phylogenetically related species. We use this difference to develop a phylogenetic footprinting model that is capable of detecting and correcting the binding-affinity bias. We find that this model improves motif prediction and that the corrected motifs are typically softer than those predicted by traditional approaches. CONCLUSIONS These findings indicate that motifs published in databases and in the literature are artificially sharpened compared to the native motifs. These findings also indicate that our current understanding of transcriptional gene regulation might be blurred, but that it is possible to advance this understanding by taking into account inter-species information available today and even more in the future.
Collapse
Affiliation(s)
- Martin Nettling
- Institute of Computer Science, Martin Luther University, Halle (Saale), Germany.
| | | | | | - Ivo Grosse
- Institute of Computer Science, Martin Luther University, Halle (Saale), Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
12
|
Gomes ALC, Wang HH. The Role of Genome Accessibility in Transcription Factor Binding in Bacteria. PLoS Comput Biol 2016; 12:e1004891. [PMID: 27104615 PMCID: PMC4841574 DOI: 10.1371/journal.pcbi.1004891] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 03/31/2016] [Indexed: 02/01/2023] Open
Abstract
ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.
Collapse
Affiliation(s)
- Antonio L. C. Gomes
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| | - Harris H. Wang
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Pathology and Cell Biology, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
13
|
Sharp JD, Singh AK, Park ST, Lyubetskaya A, Peterson MW, Gomes ALC, Potluri LP, Raman S, Galagan JE, Husson RN. Comprehensive Definition of the SigH Regulon of Mycobacterium tuberculosis Reveals Transcriptional Control of Diverse Stress Responses. PLoS One 2016; 11:e0152145. [PMID: 27003599 PMCID: PMC4803200 DOI: 10.1371/journal.pone.0152145] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 03/09/2016] [Indexed: 11/24/2022] Open
Abstract
Expression of SigH, one of 12 Mycobacterium tuberculosis alternative sigma factors, is induced by heat, oxidative and nitric oxide stresses. SigH activation has been shown to increase expression of several genes, including genes involved in maintaining redox equilibrium and in protein degradation. However, few of these are known to be directly regulated by SigH. The goal of this project is to comprehensively define the Mycobacterium tuberculosis genes and operons that are directly controlled by SigH in order to gain insight into the role of SigH in regulating M. tuberculosis physiology. We used ChIP-Seq to identify in vivo SigH binding sites throughout the M. tuberculosis genome, followed by quantification of SigH-dependent expression of genes linked to these sites and identification of SigH-regulated promoters. We identified 69 SigH binding sites, which are located both in intergenic regions and within annotated coding sequences in the annotated M. tuberculosis genome. 41 binding sites were linked to genes that showed greater expression following heat stress in a SigH-dependent manner. We identified several genes not previously known to be regulated by SigH, including genes involved in DNA repair, cysteine biosynthesis, translation, and genes of unknown function. Experimental and computational analysis of SigH-regulated promoter sequences within these binding sites identified strong consensus -35 and -10 promoter sequences, but with tolerance for non-consensus bases at specific positions. This comprehensive identification and validation of SigH-regulated genes demonstrates an extended SigH regulon that controls an unexpectedly broad range of stress response functions.
Collapse
Affiliation(s)
- Jared D. Sharp
- Division of Infectious Diseases, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, Boston, Massachusetts 02115, United States of America
| | - Atul K. Singh
- Division of Infectious Diseases, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, Boston, Massachusetts 02115, United States of America
| | - Sang Tae Park
- National Emerging Infectious Diseases Laboratories, Boston University, Boston, Massachusetts 02118, United States of America
| | - Anna Lyubetskaya
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, United States of America
| | - Matthew W. Peterson
- Department of Microbiology, Boston University, Boston, Massachusetts 02215, United States of America
| | - Antonio L. C. Gomes
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, United States of America
| | - Lakshmi-Prasad Potluri
- Division of Infectious Diseases, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, Boston, Massachusetts 02115, United States of America
| | - Sahadevan Raman
- National Emerging Infectious Diseases Laboratories, Boston University, Boston, Massachusetts 02118, United States of America
| | - James E. Galagan
- National Emerging Infectious Diseases Laboratories, Boston University, Boston, Massachusetts 02118, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, United States of America
- Department of Microbiology, Boston University, Boston, Massachusetts 02215, United States of America
| | - Robert N. Husson
- Division of Infectious Diseases, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, Boston, Massachusetts 02115, United States of America
| |
Collapse
|
14
|
Ranganathan S, Bai G, Lyubetskaya A, Knapp GS, Peterson MW, Gazdik M, C Gomes AL, Galagan JE, McDonough KA. Characterization of a cAMP responsive transcription factor, Cmr (Rv1675c), in TB complex mycobacteria reveals overlap with the DosR (DevR) dormancy regulon. Nucleic Acids Res 2015; 44:134-51. [PMID: 26358810 PMCID: PMC4705688 DOI: 10.1093/nar/gkv889] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 08/26/2015] [Indexed: 12/17/2022] Open
Abstract
Mycobacterium tuberculosis (Mtb) Cmr (Rv1675c) is a CRP/FNR family transcription factor known to be responsive to cAMP levels and during macrophage infections. However, Cmr's DNA binding properties, cellular targets and overall role in tuberculosis (TB) complex bacteria have not been characterized. In this study, we used experimental and computational approaches to characterize Cmr's DNA binding properties and identify a putative regulon. Cmr binds a 16-bp palindromic site that includes four highly conserved nucleotides that are required for DNA binding. A total of 368 binding sites, distributed in clusters among ∼200 binding regions throughout the Mycobacterium bovis BCG genome, were identified using ChIP-seq. One of the most enriched Cmr binding sites was located upstream of the cmr promoter, and we demonstrated that expression of cmr is autoregulated. cAMP affected Cmr binding at a subset of DNA loci in vivo and in vitro, including multiple sites adjacent to members of the DosR (DevR) dormancy regulon. Our findings of cooperative binding of Cmr to these DNA regions and the regulation by Cmr of the DosR-regulated virulence gene Rv2623 demonstrate the complexity of Cmr-mediated gene regulation and suggest a role for Cmr in the biology of persistent TB infection.
Collapse
Affiliation(s)
- Sridevi Ranganathan
- Department of Biomedical Sciences, School of Public Health, University at Albany, SUNY, Albany, NY 12201, USA
| | - Guangchun Bai
- Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA
| | - Anna Lyubetskaya
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Gwendowlyn S Knapp
- Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA
| | | | - Michaela Gazdik
- Department of Biomedical Sciences, School of Public Health, University at Albany, SUNY, Albany, NY 12201, USA
| | | | - James E Galagan
- Bioinformatics Program, Boston University, Boston, MA 02215, USA Department of Microbiology, Boston University, Boston, MA 02215, USA Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA 02118, USA
| | - Kathleen A McDonough
- Department of Biomedical Sciences, School of Public Health, University at Albany, SUNY, Albany, NY 12201, USA Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA
| |
Collapse
|
15
|
Pugacheva EM, Rivero-Hinojosa S, Espinoza CA, Méndez-Catalá CF, Kang S, Suzuki T, Kosaka-Suzuki N, Robinson S, Nagarajan V, Ye Z, Boukaba A, Rasko JEJ, Strunnikov AV, Loukinov D, Ren B, Lobanenkov VV. Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions. Genome Biol 2015; 16:161. [PMID: 26268681 PMCID: PMC4562119 DOI: 10.1186/s13059-015-0736-8] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 07/31/2015] [Indexed: 12/22/2022] Open
Abstract
Background CTCF and BORIS (CTCFL), two paralogous mammalian proteins sharing nearly identical DNA binding domains, are thought to function in a mutually exclusive manner in DNA binding and transcriptional regulation. Results Here we show that these two proteins co-occupy a specific subset of regulatory elements consisting of clustered CTCF binding motifs (termed 2xCTSes). BORIS occupancy at 2xCTSes is largely invariant in BORIS-positive cancer cells, with the genomic pattern recapitulating the germline-specific BORIS binding to chromatin. In contrast to the single-motif CTCF target sites (1xCTSes), the 2xCTS elements are preferentially found at active promoters and enhancers, both in cancer and germ cells. 2xCTSes are also enriched in genomic regions that escape histone to protamine replacement in human and mouse sperm. Depletion of the BORIS gene leads to altered transcription of a large number of genes and the differentiation of K562 cells, while the ectopic expression of this CTCF paralog leads to specific changes in transcription in MCF7 cells. Conclusions We discover two functionally and structurally different classes of CTCF binding regions, 2xCTSes and 1xCTSes, revealed by their predisposition to bind BORIS. We propose that 2xCTSes play key roles in the transcriptional program of cancer and germ cells. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0736-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elena M Pugacheva
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Samuel Rivero-Hinojosa
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Celso A Espinoza
- Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, CA, 92093, USA.,Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, Moores Cancer Center, San Diego School of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Claudia Fabiola Méndez-Catalá
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Sungyun Kang
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Teruhiko Suzuki
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA.,Stem Cell Project, Tokyo Metropolitan Institute of Medical Science, Kamikitazawa, Setagaya-ku, Tokyo, Japan
| | - Natsuki Kosaka-Suzuki
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Susan Robinson
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Vijayaraj Nagarajan
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Zhen Ye
- Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Abdelhalim Boukaba
- Guangzhou Institutes of Biomedicine and Health, Molecular Epigenetics Laboratory, 190 Kai Yuan Avenue, Science Park, Guangzhou, 510530, China
| | - John E J Rasko
- Gene and Stem Cell Therapy Program, Centenary Institute, Camperdown, NSW, 2050, Australia.,Sydney Medical School, University of Sydney, Sydney, NSW, 2006, Australia.,Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown, NSW, 2050, Australia
| | - Alexander V Strunnikov
- Guangzhou Institutes of Biomedicine and Health, Molecular Epigenetics Laboratory, 190 Kai Yuan Avenue, Science Park, Guangzhou, 510530, China
| | - Dmitri Loukinov
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, CA, 92093, USA. .,Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, Moores Cancer Center, San Diego School of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA.
| | - Victor V Lobanenkov
- Molecular Pathology Section, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, 20852, USA.
| |
Collapse
|
16
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|
17
|
Myers KS, Park DM, Beauchene NA, Kiley PJ. Defining bacterial regulons using ChIP-seq. Methods 2015; 86:80-8. [PMID: 26032817 DOI: 10.1016/j.ymeth.2015.05.022] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 05/22/2015] [Accepted: 05/23/2015] [Indexed: 11/28/2022] Open
Abstract
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is a powerful method that identifies protein-DNA binding sites in vivo. Recent studies have illustrated the value of ChIP-seq in studying transcription factor binding in various bacterial species under a variety of growth conditions. These results show that in addition to identifying binding sites, correlation of ChIP-seq data with expression data can reveal important information about bacterial regulons and regulatory networks. In this chapter, we provide an overview of the current state of knowledge about ChIP-seq methodology in bacteria, from sample preparation to raw data analysis. We also describe visualization and various bioinformatic analyses of processed ChIP-seq data.
Collapse
Affiliation(s)
- Kevin S Myers
- Laboratory of Genetics, University of Wisconsin - Madison, Madison, WI 53706, USA; Great Lakes Bioenergy Research Center, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Dan M Park
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Nicole A Beauchene
- Department of Biomolecular Chemistry, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Patricia J Kiley
- Department of Biomolecular Chemistry, University of Wisconsin - Madison, Madison, WI 53706, USA; Great Lakes Bioenergy Research Center, University of Wisconsin - Madison, Madison, WI 53706, USA.
| |
Collapse
|
18
|
Knapp GS, Lyubetskaya A, Peterson MW, Gomes ALC, Ma Z, Galagan JE, McDonough KA. Role of intragenic binding of cAMP responsive protein (CRP) in regulation of the succinate dehydrogenase genes Rv0249c-Rv0247c in TB complex mycobacteria. Nucleic Acids Res 2015; 43:5377-93. [PMID: 25940627 PMCID: PMC4477654 DOI: 10.1093/nar/gkv420] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Accepted: 04/19/2015] [Indexed: 11/14/2022] Open
Abstract
Bacterial pathogens adapt to changing environments within their hosts, and the signaling molecule adenosine 3', 5'-cyclic monophosphate (cAMP) facilitates this process. In this study, we characterized in vivo DNA binding and gene regulation by the cAMP-responsive protein CRP in M. bovis BCG as a model for tuberculosis (TB)-complex bacteria. Chromatin immunoprecipitation followed by deep-sequencing (ChIP-seq) showed that CRP associates with ∼900 DNA binding regions, most of which occur within genes. The most highly enriched binding region was upstream of a putative copper transporter gene (ctpB), and crp-deleted bacteria showed increased sensitivity to copper toxicity. Detailed mutational analysis of four CRP binding sites upstream of the virulence-associated Rv0249c-Rv0247c succinate dehydrogenase genes demonstrated that CRP directly regulates Rv0249c-Rv0247c expression from two promoters, one of which requires sequences intragenic to Rv0250c for maximum expression. The high percentage of intragenic CRP binding sites and our demonstration that these intragenic DNA sequences significantly contribute to biologically relevant gene expression greatly expand the genome space that must be considered for gene regulatory analyses in mycobacteria. These findings also have practical implications for an important bacterial pathogen in which identification of mutations that affect expression of drug target-related genes is widely used for rapid drug resistance screening.
Collapse
Affiliation(s)
- Gwendowlyn S Knapp
- Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA
| | - Anna Lyubetskaya
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | | | | | - Zhuo Ma
- Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA
| | - James E Galagan
- Bioinformatics Program, Boston University, Boston, MA 02215, USA Department of Biomedical Engineering, Boston, MA 02215, USA Department of Microbiology, Boston University, Boston, MA 02215, USA
| | - Kathleen A McDonough
- Wadsworth Center, New York State Department of Health, 120 New Scotland Avenue, PO Box 22002, Albany, NY 12201-2002, USA Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY 12201, USA
| |
Collapse
|