1
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coli. Nat Commun 2025; 16:4255. [PMID: 40335485 PMCID: PMC12059191 DOI: 10.1038/s41467-025-58862-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 04/03/2025] [Indexed: 05/09/2025] Open
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We use these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We use BoltzNet to quantitatively design novel binding sites, which we validate with biophysical experiments on purified protein. We generate models for 124 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México, México, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México, México
| | - Víctor H Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA.
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA, USA.
| |
Collapse
|
2
|
Pan RW, Röschinger T, Faizi K, Garcia HG, Phillips R. Deciphering regulatory architectures of bacterial promoters from synthetic expression patterns. PLoS Comput Biol 2024; 20:e1012697. [PMID: 39724021 PMCID: PMC11709304 DOI: 10.1371/journal.pcbi.1012697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 01/08/2025] [Accepted: 12/04/2024] [Indexed: 12/28/2024] Open
Abstract
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRAs, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. These parameters include binding site copy number, where a large number of specific binding sites may titrate away transcription factors, as well as the presence of overlapping binding sites, which may affect analysis of the degree of mutual dependence between mutations in the regulatory region and expression levels. To that end, in this paper we create tens of thousands of synthetic gene expression outputs for bacterial promoters using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and thus to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the mapping between mutations in binding sites and their corresponding expression profiles, a tool useful not only for developing a theory of transcription, but also for exploring regulatory evolution.
Collapse
Affiliation(s)
- Rosalind Wenshan Pan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Kian Faizi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Hernan G. Garcia
- Biophysics Graduate Group, University of California, Berkeley, California, United States of America
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, California, United States of America
- Chan Zuckerberg Biohub-San Francisco, San Francisco, California, United States of America
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
3
|
Brill J, Nurmi C, Li Y. Elucidating Evolutionary Mechanisms and Variants of the Hammerhead Ribozyme Using In Vitro Selection. Chembiochem 2024; 25:e202400432. [PMID: 39116094 DOI: 10.1002/cbic.202400432] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 08/05/2024] [Accepted: 08/07/2024] [Indexed: 08/10/2024]
Abstract
The Hammerhead Ribozyme (HHR) is a ubiquitous RNA enzyme that catalyzes site-specific intramolecular cleavage. While mutations to its catalytic core have traditionally been viewed as detrimental to its activity, several discoveries of naturally occurring variants of the full-length ribozyme challenge this notion, suggesting a deeper understanding of HHR evolution and functionality. By systematically introducing mutations at key nucleotide positions within the catalytic core, we generated single-, double-, and triple-mutation libraries to explore the sequence requirements and evolution of a full-length HHR. In vitro selection revealed many novel hammerhead variants, some of which possess mutations at nucleotides previously considered to be essential. We also demonstrate that the evolutionary trajectory of each nucleotide in the catalytic core directly correlates with their functional importance, potentially giving researchers a novel method to assess the sequence requirements of functional nucleic acids.
Collapse
Affiliation(s)
- Jake Brill
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada
| | - Connor Nurmi
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada
| | - Yingfu Li
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada
| |
Collapse
|
4
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.594371. [PMID: 38826350 PMCID: PMC11142182 DOI: 10.1101/2024.05.23.594371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México 14610, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México
| | - Víctor H. Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA 02215
| |
Collapse
|
5
|
Schneider TD, Jejjala V. Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences. PLoS One 2019; 14:e0222419. [PMID: 31671158 PMCID: PMC6822723 DOI: 10.1371/journal.pone.0222419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/29/2019] [Indexed: 11/19/2022] Open
Abstract
Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.
Collapse
Affiliation(s)
- Thomas D. Schneider
- National Institutes of Health, National Cancer Institute, Center for Cancer Research, RNA Biology Laboratory, Frederick, Maryland, United States of America
| | - Vishnu Jejjala
- Mandelstam Institute for Theoretical Physics, School of Physics, NITheP, and CoE-MaSS, University of the Witwatersrand, Johannesburg, South Africa
- David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
6
|
Brown T, Brown N, Stollar EJ. Most yeast SH3 domains bind peptide targets with high intrinsic specificity. PLoS One 2018; 13:e0193128. [PMID: 29470497 PMCID: PMC5823434 DOI: 10.1371/journal.pone.0193128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open
Abstract
A need exists to develop bioinformatics for predicting differences in protein function, especially for members of a domain family who share a common fold, yet are found in a diverse array of proteins. Many domain families have been conserved over large evolutionary spans and representative genomic data during these periods are now available. This allows a simple method for grouping domain sequences to reveal common and unique/specific binding residues. As such, we hypothesize that sequence alignment analysis of the yeast SH3 domain family across ancestral species in the fungal kingdom can determine whether each member encodes specific information to bind unique peptide targets. With this approach, we identify important specific residues for a given domain as those that show little conservation within an alignment of yeast domain family members (paralogs) but are conserved in an alignment of its direct relatives (orthologs). We find most of the yeast SH3 domain family members have maintained unique amino acid conservation patterns that suggest they bind peptide targets with high intrinsic specificity through varying degrees of non-canonical recognition. For a minority of domains, we predict a less diverse binding surface, likely requiring additional factors to bind targets specifically. We observe that our predictions are consistent with high throughput binding data, which suggests our approach can probe intrinsic binding specificity in any other interaction domain family that is maintained during evolution.
Collapse
Affiliation(s)
- Tom Brown
- Math and Computer Science Department, Eastern New Mexico University, Portales, NM, United States of America
| | - Nick Brown
- Portales High School, Portales, NM, United States of America
| | - Elliott J. Stollar
- Physical Sciences Department, Eastern New Mexico University, Portales, NM, United States of America
- * E-mail:
| |
Collapse
|
7
|
Lima WR, Martins DC, Parreira KS, Scarpelli P, Santos de Moraes M, Topalis P, Hashimoto RF, Garcia CRS. Genome-wide analysis of the human malaria parasite Plasmodium falciparum transcription factor PfNF-YB shows interaction with a CCAAT motif. Oncotarget 2017; 8:113987-114001. [PMID: 29371963 PMCID: PMC5768380 DOI: 10.18632/oncotarget.23053] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 11/26/2017] [Indexed: 12/04/2022] Open
Abstract
Little is known about transcription factor regulation during the Plasmodium falciparum intraerythrocytic cycle. In order to elucidate the role of the P. falciparum (Pf)NF-YB transcription factor we searched for target genes in the entire genome. PfNF-YB mRNA is highly expressed in late trophozoite and schizont stages relative to the ring stage. In order to determine the candidate genes bound by PfNF-YB a ChIP-on-chip assay was carried out and 297 genes were identified. Ninety nine percent of PfNF-YB binding was to putative promoter regions of protein coding genes of which only 16% comprise proteins of known function. Interestingly, our data reveal that PfNF-YB binding is not exclusively to a canonical CCAAT box motif. PfNF-YB binds to genes coding for proteins implicated in a range of different biological functions, such as replication protein A large subunit (DNA replication), hypoxanthine phosphoribosyltransferase (nucleic acid metabolism) and multidrug resistance protein 2 (intracellular transport).
Collapse
Affiliation(s)
- Wânia Rezende Lima
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil.,Instituto de Ciências Exatas e Naturais-Medicina, Universidade Federal de Mato Grosso-Campus Rondonópolis, Mato Grosso, Brazil
| | - David Correa Martins
- Centro de Matemática, Computação e Cognição, Universidade Federal do ABC, Santo André, Brazil
| | - Kleber Simônio Parreira
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil.,Instituto de Ciências Exatas e Naturais-Medicina, Universidade Federal de Mato Grosso-Campus Rondonópolis, Mato Grosso, Brazil
| | - Pedro Scarpelli
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Miriam Santos de Moraes
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, FORTH, Hellas, Greece
| | - Ronaldo Fumio Hashimoto
- Departamento de Ciência da Computação, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
| | - Célia R S Garcia
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
8
|
Machha VR, Mikek CG, Wellman S, Lewis EA. Temperature and osmotic stress dependence of the thermodynamics for binding linker histone H1 0, Its carboxyl domain (H1 0-C) or globular domain (H1 0-G) to B-DNA. Biochem Biophys Rep 2017; 12:158-165. [PMID: 29090277 PMCID: PMC5645174 DOI: 10.1016/j.bbrep.2017.09.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 09/21/2017] [Accepted: 09/25/2017] [Indexed: 12/24/2022] Open
Abstract
Linker histones (H1) are the basic proteins in higher eukaryotes that are responsible for the final condensation of chromatin. In contrast to the nucleosome core histone proteins, the role of H1 in compacting DNA is not clearly understood. In this study ITC was used to measure the binding constant, enthalpy change, and binding site size for the interactions of H10, or its C-terminal (H10-C) and globular (H10-G) domains to highly polymerized calf-thymus DNA at temperatures from 288 K to 308 K. Heat capacity changes, ΔCp, for these same H10 binding interactions were estimated from the temperature dependence of the enthalpy changes. The enthalpy changes for binding H10, H10-C, or H10-G to CT-DNA are all endothermic at 298 K, becoming more exothermic as the temperature is increased. The ΔH for binding H10-G to CT-DNA is exothermic at temperatures above approximately 300 K. Osmotic stress experiments indicate that the binding of H10 is accompanied by the release of approximately 35 water molecules. We estimate from our naked DNA titration results that the binding of the H10 to the nucleosome places the H10 protein in close contact with approximately 41 DNA bp. The breakdown is that the H10 carboxyl terminus interacts with 28 bp of linker DNA on one side of the nucleosome, the H10 globular domain binds directly to 7 bp of core DNA, and shields another 6 linker DNA bases, 3 bp on either side of the nucleosome where the linker DNA exits the nucleosome core.
Collapse
Affiliation(s)
- V R Machha
- Division of Hematology, Departments of Internal Medicine and Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| | - C G Mikek
- Department of Chemistry, Mississippi State University, Mississippi, MS 39762, USA
| | - S Wellman
- Department of Pharmacology and Toxicology, University of Mississippi Medical Center, 2500 N. State Street, Jackson, MS 39216, USA
| | - E A Lewis
- Department of Chemistry, Mississippi State University, Mississippi, MS 39762, USA
| |
Collapse
|
9
|
Stirling F, Bitzan L, O'Keefe S, Redfield E, Oliver JWK, Way J, Silver PA. Rational Design of Evolutionarily Stable Microbial Kill Switches. Mol Cell 2017; 68:686-697.e3. [PMID: 29149596 DOI: 10.1016/j.molcel.2017.10.033] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 08/11/2017] [Accepted: 10/24/2017] [Indexed: 12/12/2022]
Abstract
The evolutionary stability of synthetic genetic circuits is key to both the understanding and application of genetic control elements. One useful but challenging situation is a switch between life and death depending on environment. Here are presented "essentializer" and "cryodeath" circuits, which act as kill switches in Escherichia coli. The essentializer element induces cell death upon the loss of a bi-stable cI/Cro memory switch. Cryodeath makes use of a cold-inducible promoter to express a toxin. We employ rational design and a toxin/antitoxin titering approach to produce and screen a small library of potential constructs, in order to select for constructs that are evolutionarily stable. Both kill switches were shown to maintain functionality in vitro for at least 140 generations. Additionally, cryodeath was shown to control the growth environment of a population, with an escape frequency of less than 1 in 105 after 10 days of growth in the mammalian gut.
Collapse
Affiliation(s)
- Finn Stirling
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA
| | - Lisa Bitzan
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA
| | - Samuel O'Keefe
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA
| | - Elizabeth Redfield
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA
| | - John W K Oliver
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA
| | - Jeffrey Way
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA
| | - Pamela A Silver
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert 536, Boston, MA 02115, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, 5th Floor, Boston, MA 02115, USA.
| |
Collapse
|
10
|
The Developmental Switch in Bacteriophage λ: A Critical Role of the Cro Protein. J Mol Biol 2017; 430:58-68. [PMID: 29158090 DOI: 10.1016/j.jmb.2017.11.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 11/09/2017] [Accepted: 11/14/2017] [Indexed: 11/21/2022]
Abstract
Bacteriophage λ of Escherichia coli has two alternative life cycles after infection-host survival with lysogen formation, or host lysis and phage production. In a lysogen, CI represses the two lytic promoters, pR and pL, and activates its own transcription from the auto-regulated pRM promoter. During induction from the lysogenic to lytic state, CI is inactivated, and the two lytic promoters are de-repressed allowing for expression of Cro from pR. Cro is known to repress transcription of CI from pRM to prevent lysogeny. We show here that when Cro and CI are both present but at low levels, the low level of Cro initially stimulates the lytic promoters while CI repressor is still present, stimulating the level of Cro to a concentration required for pRM repression. Cro has no stimulatory effect without the presence of CI. We propose that this early auto-activating role of Cro at lower concentrations is essential in the developmental switch to lytic growth, whereas pRM repression by Cro at relatively higher concentrations avoids restoring lysogeny.
Collapse
|
11
|
Wongrattanakamon P, Lee VS, Nimmanpipug P, Sirithunyalug B, Chansakaow S, Jiranusornkul S. Insight into the molecular mechanism of P-glycoprotein mediated drug toxicity induced by bioflavonoids: an integrated computational approach. Toxicol Mech Methods 2017; 27:253-271. [PMID: 27996361 DOI: 10.1080/15376516.2016.1273428] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In this work, molecular docking, pharmacophore modeling and molecular dynamics (MD) simulation were rendered for the mouse P-glycoprotein (P-gp) (code: 4Q9H) and bioflavonoids; amorphigenin, chrysin, epigallocatechin, formononetin and rotenone including a positive control; verapamil to identify protein-ligand interaction features including binding affinities, interaction characteristics, hot-spot amino acid residues and complex stabilities. These flavonoids occupied the same binding site with high binding affinities and shared the same key residues for their binding interactions and the binding region of the flavonoids was revealed that overlapped the ATP binding region with hydrophobic and hydrophilic interactions suggesting a competitive inhibition mechanism of the compounds. Root mean square deviations (RMSDs) analysis of MD trajectories of the protein-ligand complexes and NBD2 residues, and ligands pointed out these residues were stable throughout the duration of MD simulations. Thus, the applied preliminary structure-based molecular modeling approach of interactions between NBD2 and flavonoids may be gainful to realize the intimate inhibition mechanism of P-gp at NBD2 level and on the basis of the obtained data, it can be concluded that these bioflavonoids have the potential to cause herb-drug interactions or be used as lead molecules for the inhibition of P-gp (as anti-multidrug resistance agents) via the NBD2 blocking mechanism in future.
Collapse
Affiliation(s)
- Pathomwat Wongrattanakamon
- a Laboratory for Molecular Design and Simulation (LMDS), Department of Pharmaceutical Sciences, Faculty of Pharmacy , Chiang Mai University , Chiang Mai , Thailand
| | - Vannajan Sanghiran Lee
- b Department of Chemistry, Faculty of Science , University of Malaya , Kuala Lumpur , Malaysia
| | - Piyarat Nimmanpipug
- c Computational Simulation and Modelling Laboratory (CSML), Department of Chemistry, Faculty of Science , Chiang Mai University , Chiang Mai , Thailand
| | - Busaban Sirithunyalug
- d Department of Pharmaceutical Sciences, Faculty of Pharmacy , Chiang Mai University , Chiang Mai , Thailand
| | - Sunee Chansakaow
- d Department of Pharmaceutical Sciences, Faculty of Pharmacy , Chiang Mai University , Chiang Mai , Thailand
| | - Supat Jiranusornkul
- a Laboratory for Molecular Design and Simulation (LMDS), Department of Pharmaceutical Sciences, Faculty of Pharmacy , Chiang Mai University , Chiang Mai , Thailand
| |
Collapse
|
12
|
Chattopadhyay A, Zandarashvili L, Luu RH, Iwahara J. Thermodynamic Additivity for Impacts of Base-Pair Substitutions on Association of the Egr-1 Zinc-Finger Protein with DNA. Biochemistry 2016; 55:6467-6474. [PMID: 27933778 DOI: 10.1021/acs.biochem.6b00757] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The transcription factor Egr-1 specifically binds as a monomer to its 9 bp target DNA sequence, GCGTGGGCG, via three zinc fingers and plays important roles in the brain and cardiovascular systems. Using fluorescence-based competitive binding assays, we systematically analyzed the impacts of all possible single-nucleotide substitutions in the target DNA sequence and determined the change in binding free energy for each. Then, we measured the changes in binding free energy for sequences with multiple substitutions and compared them with the sum of the changes in binding free energy for each constituent single substitution. For the DNA variants with two or three nucleotide substitutions in the target sequence, we found excellent agreement between the measured and predicted changes in binding free energy. Interestingly, however, we found that this thermodynamic additivity broke down with a larger number of substitutions. For DNA sequences with four or more substitutions, the measured changes in binding free energy were significantly larger than predicted. On the basis of these results, we analyzed the occurrences of high-affinity sequences in the genome and found that the genome contains millions of such sequences that might functionally sequester Egr-1.
Collapse
Affiliation(s)
- Abhijnan Chattopadhyay
- Department of Biochemistry & Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch , Galveston, Texas 77555, United States
| | - Levani Zandarashvili
- Department of Biochemistry & Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch , Galveston, Texas 77555, United States
| | - Ross H Luu
- Department of Biochemistry & Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch , Galveston, Texas 77555, United States
| | - Junji Iwahara
- Department of Biochemistry & Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch , Galveston, Texas 77555, United States
| |
Collapse
|
13
|
Choudhury S, Ghosh B, Singh P, Ghosh R, Roy S, Pal SK. Ultrafast differential flexibility of Cro-protein binding domains of two operator DNAs with different sequences. Phys Chem Chem Phys 2016; 18:17983-90. [DOI: 10.1039/c6cp02522f] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The crucial ultrafast domain fluctuation of the operator DNA OR3 over OR2 upon complexation with the repressor Cro-protein dimer has been investigated.
Collapse
Affiliation(s)
- Susobhan Choudhury
- Department of Chemical
- Biological & Macromolecular Sciences
- S. N. Bose National Centre for Basic Sciences
- Kolkata 700 098
- India
| | - Basusree Ghosh
- Division of Structural Biology and Bioinformatics
- Indian Institute of Chemical Biology
- Kolkata 700 032
- India
| | - Priya Singh
- Department of Chemical
- Biological & Macromolecular Sciences
- S. N. Bose National Centre for Basic Sciences
- Kolkata 700 098
- India
| | - Raka Ghosh
- Division of Structural Biology and Bioinformatics
- Indian Institute of Chemical Biology
- Kolkata 700 032
- India
| | - Siddhartha Roy
- Division of Structural Biology and Bioinformatics
- Indian Institute of Chemical Biology
- Kolkata 700 032
- India
| | - Samir Kumar Pal
- Department of Chemical
- Biological & Macromolecular Sciences
- S. N. Bose National Centre for Basic Sciences
- Kolkata 700 098
- India
| |
Collapse
|
14
|
Simple Biophysical Model Predicts Faster Accumulation of Hybrid Incompatibilities in Small Populations Under Stabilizing Selection. Genetics 2015; 201:1525-37. [PMID: 26434721 PMCID: PMC4676520 DOI: 10.1534/genetics.115.181685] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 09/23/2015] [Indexed: 01/07/2023] Open
Abstract
Speciation is fundamental to the process of generating the huge diversity of life on Earth. However, we are yet to have a clear understanding of its molecular-genetic basis. Here, we examine a computational model of reproductive isolation that explicitly incorporates a map from genotype to phenotype based on the biophysics of protein–DNA binding. In particular, we model the binding of a protein transcription factor to a DNA binding site and how their independent coevolution, in a stabilizing fitness landscape, of two allopatric lineages leads to incompatibilities. Complementing our previous coarse-grained theoretical results, our simulations give a new prediction for the monomorphic regime of evolution that smaller populations should develop incompatibilities more quickly. This arises as (1) smaller populations have a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space, and (2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences that does not require peak shifts or positive selection. Finally, we show that the growth of DMIs with time is quadratic for small populations, agreeing with Orr’s model, but nonpower law for large populations, with a form consistent with our previous theoretical results.
Collapse
|
15
|
Evolutionary meandering of intermolecular interactions along the drift barrier. Proc Natl Acad Sci U S A 2014; 112:E30-8. [PMID: 25535374 DOI: 10.1073/pnas.1421641112] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many cellular functions depend on highly specific intermolecular interactions, for example transcription factors and their DNA binding sites, microRNAs and their RNA binding sites, the interfaces between heterodimeric protein molecules, the stems in RNA molecules, and kinases and their response regulators in signal-transduction systems. Despite the need for complementarity between interacting partners, such pairwise systems seem to be capable of high levels of evolutionary divergence, even when subject to strong selection. Such behavior is a consequence of the diminishing advantages of increasing binding affinity between partners, the multiplicity of evolutionary pathways between selectively equivalent alternatives, and the stochastic nature of evolutionary processes. Because mutation pressure toward reduced affinity conflicts with selective pressure for greater interaction, situations can arise in which the expected distribution of the degree of matching between interacting partners is bimodal, even in the face of constant selection. Although biomolecules with larger numbers of interacting partners are subject to increased levels of evolutionary conservation, their more numerous partners need not converge on a single sequence motif or be increasingly constrained in more complex systems. These results suggest that most phylogenetic differences in the sequences of binding interfaces are not the result of adaptive fine tuning but a simple consequence of random genetic drift.
Collapse
|
16
|
Lewis DD, Villarreal FD, Wu F, Tan C. Synthetic biology outside the cell: linking computational tools to cell-free systems. Front Bioeng Biotechnol 2014; 2:66. [PMID: 25538941 PMCID: PMC4260521 DOI: 10.3389/fbioe.2014.00066] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 11/23/2014] [Indexed: 12/22/2022] Open
Abstract
As mathematical models become more commonly integrated into the study of biology, a common language for describing biological processes is manifesting. Many tools have emerged for the simulation of in vivo synthetic biological systems, with only a few examples of prominent work done on predicting the dynamics of cell-free synthetic systems. At the same time, experimental biologists have begun to study dynamics of in vitro systems encapsulated by amphiphilic molecules, opening the door for the development of a new generation of biomimetic systems. In this review, we explore both in vivo and in vitro models of biochemical networks with a special focus on tools that could be applied to the construction of cell-free expression systems. We believe that quantitative studies of complex cellular mechanisms and pathways in synthetic systems can yield important insights into what makes cells different from conventional chemical systems.
Collapse
Affiliation(s)
- Daniel D. Lewis
- Integrative Genetics and Genomics, University of California Davis, Davis, CA, USA
- Department of Biomedical Engineering, University of California Davis, Davis, CA, USA
| | | | - Fan Wu
- Department of Biomedical Engineering, University of California Davis, Davis, CA, USA
| | - Cheemeng Tan
- Department of Biomedical Engineering, University of California Davis, Davis, CA, USA
| |
Collapse
|
17
|
High-resolution specificity from DNA sequencing highlights alternative modes of Lac repressor binding. Genetics 2014; 198:1329-43. [PMID: 25209146 DOI: 10.1534/genetics.114.170100] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor-operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection.
Collapse
|
18
|
Ahlstrom LS, Miyashita O. Packing interface energetics in different crystal forms of the λ Cro dimer. Proteins 2013; 82:1128-41. [PMID: 24218107 DOI: 10.1002/prot.24478] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 10/27/2013] [Accepted: 11/04/2013] [Indexed: 12/22/2022]
Abstract
Variation among crystal structures of the λ Cro dimer highlights conformational flexibility. The structures range from a wild type closed to a mutant fully open conformation, but it is unclear if each represents a stable solution state or if one may be the result of crystal packing. Here we use molecular dynamics (MD) simulation to investigate the energetics of crystal packing interfaces and the influence of site-directed mutagenesis on them in order to examine the effect of crystal packing on wild type and mutant Cro dimer conformation. Replica exchange MD of mutant Cro in solution shows that the observed conformational differences between the wild type and mutant protein are not the direct consequence of mutation. Instead, simulation of Cro in different crystal environments reveals that mutation affects the stability of crystal forms. Molecular Mechanics Poisson-Boltzmann Surface Area binding energy calculations reveal the detailed energetics of packing interfaces. Packing interfaces can have diverse properties in strength, energetic components, and some are stronger than the biological dimer interface. Further analysis shows that mutation can strengthen packing interfaces by as much as ∼5 kcal/mol in either crystal environment. Thus, in the case of Cro, mutation provides an additional energetic contribution during crystal formation that may stabilize a fully open higher energy state. Moreover, the effect of mutation in the lattice can extend to packing interfaces not involving mutation sites. Our results provide insight into possible models for the effect of crystallization on Cro conformational dynamics and emphasize careful consideration of protein crystal structures.
Collapse
Affiliation(s)
- Logan S Ahlstrom
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721
| | | |
Collapse
|
19
|
Thyme SB, Boissel SJS, Arshiya Quadri S, Nolan T, Baker DA, Park RU, Kusak L, Ashworth J, Baker D. Reprogramming homing endonuclease specificity through computational design and directed evolution. Nucleic Acids Res 2013; 42:2564-76. [PMID: 24270794 PMCID: PMC3936771 DOI: 10.1093/nar/gkt1212] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Homing endonucleases (HEs) can be used to induce targeted genome modification to reduce the fitness of pathogen vectors such as the malaria-transmitting Anopheles gambiae and to correct deleterious mutations in genetic diseases. We describe the creation of an extensive set of HE variants with novel DNA cleavage specificities using an integrated experimental and computational approach. Using computational modeling and an improved selection strategy, which optimizes specificity in addition to activity, we engineered an endonuclease to cleave in a gene associated with Anopheles sterility and another to cleave near a mutation that causes pyruvate kinase deficiency. In the course of this work we observed unanticipated context-dependence between bases which will need to be mechanistically understood for reprogramming of specificity to succeed more generally.
Collapse
Affiliation(s)
- Summer B Thyme
- Department of Biochemistry, University of Washington, UW Box 357350, 1705 NE Pacific St., Seattle, WA 98195, USA, Graduate Program in Biomolecular Structure and Design, University of Washington, UW Box 357350, 1705 NE Pacific St., Seattle, WA 98195, USA, Graduate Program in Molecular and Cellular Biology, University of Washington, UW Box 357275, 1959 NE Pacific St., Seattle, WA 98195, USA, Department of Life Sciences, Sir Alexander Fleming Building, Imperial College London, Imperial College Road, London SW7 2AZ, UK, Department of Genetics, University of Cambridge, Downing Street, Cambridge CB1 3QA, UK, Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA and Howard Hughes Medical Institute, University of Washington, UW Box 357350, 1705 NE Pacific St., Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary. It briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
Collapse
|
21
|
Gromiha MM. Structure based sequence dependent stiffness scale for trinucleotides: a direct method. J Biol Phys 2013; 26:43-50. [PMID: 23345711 DOI: 10.1023/a:1005250718139] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A new set of stiffness parameters for all the 32trinucleotide units has been set up directly from thethree dimensional structures of DNA molecules. It wasobserved that GAC/GTC is the stiffest trinucleotideand ACC/GGT is the most flexible one. The averagestiffness values computed for a set of operatorsequences using the new parameters correlate very wellwith the protein-DNA binding specificity and bindingfree energy change of 434 repressor and Cro repressor,respectively. The new structure based stiffness scalecan explain the protein-DNA binding specificity to thelevel of 0.92.
Collapse
Affiliation(s)
- M M Gromiha
- The Institute of Physical and Chemical Research (RIKEN), Tsukuba Life Science Center, 3-1-1 Koyadai, Tsukuba, Ibaraki, 305-0074 Japan
| |
Collapse
|
22
|
Gromiha MM, Nagarajan R. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 91:65-99. [PMID: 23790211 DOI: 10.1016/b978-0-12-411637-5.00003-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-DNA recognition plays an important role in the regulation of gene expression. Understanding the influence of specific residues for protein-DNA interactions and the recognition mechanism of protein-DNA complexes is a challenging task in molecular and computational biology. Several computational approaches have been put forward to tackle these problems from different perspectives: (i) development of databases for the interactions between protein and DNA and binding specificity of protein-DNA complexes, (ii) structural analysis of protein-DNA complexes, (iii) discriminating DNA-binding proteins from amino acid sequence, (iv) prediction of DNA-binding sites and protein-DNA binding specificity using sequence and/or structural information, and (v) understanding the recognition mechanism of protein-DNA complexes. In this review, we focus on all these issues and extensively discuss the advancements on the development of comprehensive bioinformatics databases for protein-DNA interactions, efficient tools for identifying the binding sites, and plausible mechanisms for understanding the recognition of protein-DNA complexes. Further, the available online resources for understanding protein-DNA interactions are collectively listed, which will serve as ready-to-use information for the research community.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
| | | |
Collapse
|
23
|
Brewster RC, Jones DL, Phillips R. Tuning promoter strength through RNA polymerase binding site design in Escherichia coli. PLoS Comput Biol 2012; 8:e1002811. [PMID: 23271961 PMCID: PMC3521663 DOI: 10.1371/journal.pcbi.1002811] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Accepted: 10/18/2012] [Indexed: 11/18/2022] Open
Abstract
One of the paramount goals of synthetic biology is to have the ability to tune transcriptional networks to targeted levels of expression at will. As a step in that direction, we have constructed a set of 18 unique binding sites for E. coli RNA Polymerase (RNAP) δ⁷⁰ holoenzyme, designed using a model of sequence-dependent binding energy combined with a thermodynamic model of transcription to produce a targeted level of gene expression. This promoter set allows us to determine the correspondence between the absolute numbers of mRNA molecules or protein products and the predicted promoter binding energies measured in k(B)T energy units. These binding sites adhere on average to the predicted level of gene expression over 3 orders of magnitude in constitutive gene expression, to within a factor of 3 in both protein and mRNA copy number. With these promoters in hand, we then place them under the regulatory control of a bacterial repressor and show that again there is a strict correspondence between the measured and predicted levels of expression, demonstrating the transferability of the promoters to an alternate regulatory context. In particular, our thermodynamic model predicts the expression from our promoters under a range of repressor concentrations between several per cell up to over 100 per cell. After correcting the predicted polymerase binding strength using the data from the unregulated promoter, the thermodynamic model accurately predicts the expression for the simple repression strains to within 30%. Demonstration of modular promoter design, where parts of the circuit (such as RNAP/TF binding strength and transcription factor copy number) can be independently chosen from a stock list and combined to give a predictable result, has important implications as an engineering tool for use in synthetic biology.
Collapse
Affiliation(s)
- Robert C. Brewster
- Department of Applied Physics, California Institute of Technology, Pasadena, California, United States of America
| | - Daniel L. Jones
- Department of Applied Physics, California Institute of Technology, Pasadena, California, United States of America
| | - Rob Phillips
- Department of Applied Physics, California Institute of Technology, Pasadena, California, United States of America
- Division of Biology, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
24
|
Debnath S, Roy NS, Bera I, Ghoshal N, Roy S. Indirect read-out of the promoter DNA by RNA polymerase in the closed complex. Nucleic Acids Res 2012; 41:366-77. [PMID: 23118489 PMCID: PMC3592454 DOI: 10.1093/nar/gks1018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Transcription is initiated when RNA polymerase recognizes the duplex promoter DNA in the closed complex. Due to its transient nature, the closed complex has not been well characterized. How the initial promoter recognition occurs may offer important clues to regulation of transcription initiation. In this article, we have carried out single-base pair substitution experiments on two Escherichia coli promoters belonging to two different classes, the -35 and the extended -10, under conditions which stabilize the closed complex. Single-base pair substitution experiments indicate modest base-specific effects on the stability of the closed complex of both promoters. Mutations of base pairs in the -10 region affect the closed complexes of two promoters differently, suggesting different modes of interaction of the RNA polymerase and the promoter in the two closed complexes. Two residues on σ(70) which have been suggested to play important role in promoter recognition, Q437 and R436, were mutated and found to have different effects on the closed-complex stability. DNA circular dichroism (CD) and FRET suggest that the promoter DNA in the closed complex is distorted. Modeling suggests two different orientations of the recognition helix of the RNA polymerase in the closed complex. We propose that the RNA polymerase recognizes the sequence dependent conformation of the promoter DNA in the closed complex.
Collapse
Affiliation(s)
- Subrata Debnath
- Division of Structural Biology and Bioinformatics, CSIR-Indian Institute of Chemical Biology, 4 Raja Subodh Mullick Road, Kolkata 700032, India
| | | | | | | | | |
Collapse
|
25
|
Mazumder A, Maiti A, Roy K, Roy S. A synthetic peptide mimic of λ-Cro shows sequence-specific binding in vitro and in vivo. ACS Chem Biol 2012; 7:1084-94. [PMID: 22480451 DOI: 10.1021/cb200523n] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Development of small synthetic transcription factors is important for future cellular engineering and therapeutics. This article describes the chemical synthesis of α-amino-isobutyric acid (Aib) substituted, conformationally constrained, helical peptide mimics of Cro protein from bacteriophage λ that encompasses the DNA recognition elements. The Aib substituted constrained helical peptide monomer shows a moderately reduced dissociation constant compared to the corresponding unsubstituted wild type peptide. A suitably cross-linked dimeric version of the peptide, mimicking the dimeric protein, recapitulates some of the important features of Cro. It binds to the operator site O(R)3, a high affinity Cro binding site in the λ genome, with good affinity and single base-pair discrimination specificity. A dimeric version of an even shorter peptide mimic spanning only the recognition helix of the helix-turn-helix motif of the Cro protein was created following the same design principles. This dimeric peptide binds to O(R)3 with affinity greater than that of the longer version. Chemical shift perturbation experiments show that the binding mode of this peptide dimer to the cognate operator site sequence is similar to the wild type Cro protein. A Green Fluorescent Protein based reporter assay in vivo reveals that the peptide dimer binds the operator site sequences with considerable selectivity and inhibits gene expression. Peptide mimics designed in this way may provide a future framework for creating effective synthetic transcription factors.
Collapse
Affiliation(s)
- Abhishek Mazumder
- Divisions of Structural Biology and Bioinformatics, CSIR-Indian Institute of Chemical Biology, 4, Raja
S.C. Mullick Road, Kolkata 700032, India
| | - Atanu Maiti
- Divisions of Structural Biology and Bioinformatics, CSIR-Indian Institute of Chemical Biology, 4, Raja
S.C. Mullick Road, Kolkata 700032, India
| | - Koushik Roy
- Divisions of Structural Biology and Bioinformatics, CSIR-Indian Institute of Chemical Biology, 4, Raja
S.C. Mullick Road, Kolkata 700032, India
| | - Siddhartha Roy
- Divisions of Structural Biology and Bioinformatics, CSIR-Indian Institute of Chemical Biology, 4, Raja
S.C. Mullick Road, Kolkata 700032, India
| |
Collapse
|
26
|
Bullwinkle TJ, Samorodnitsky D, Rosati RC, Koudelka GB. Determinants of bacteriophage 933W repressor DNA binding specificity. PLoS One 2012; 7:e34563. [PMID: 22509323 PMCID: PMC3317979 DOI: 10.1371/journal.pone.0034563] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 03/06/2012] [Indexed: 11/22/2022] Open
Abstract
We reported previously that 933W repressor apparently does not cooperatively bind to adjacent sites on DNA and that the relative affinities of 933W repressor for its operators differ significantly from that of any other lambdoid bacteriophage. These findings indicate that the operational details of the lysis-lysogeny switch of bacteriophage 933W are unique among lambdoid bacteriophages. Since the functioning of the lysis-lysogeny switch in 933W bacteriophage uniquely and solely depends on the order of preference of 933W repressor for its operators, we examined the details of how 933W repressor recognizes its DNA sites. To identify the specificity determinants, we first created a molecular model of the 933W repressor-DNA complex and tested the predicted protein-DNA interactions. These results of these studies provide a picture of how 933W repressor recognizes its DNA sites. We also show that, opposite of what is normally observed for lambdoid phages, 933W operator sequences have evolved in such a way that the presence of the most commonly found base sequences at particular operator positions serves to decrease, rather than increase, the affinity of the protein for the site. This finding cautions against assuming that a consensus sequence derived from sequence analysis defines the optimal, highest affinity DNA binding site for a protein.
Collapse
Affiliation(s)
- Tammy J. Bullwinkle
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, United States of America
| | - Daniel Samorodnitsky
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, United States of America
| | - Rayna C. Rosati
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, United States of America
| | - Gerald B. Koudelka
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, United States of America
- * E-mail:
| |
Collapse
|
27
|
Chakrabarti J, Chandra N, Raha P, Roy S. High-affinity quasi-specific sites in the genome: how the DNA-binding proteins cope with them. Biophys J 2011; 101:1123-9. [PMID: 21889449 DOI: 10.1016/j.bpj.2011.07.041] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 06/25/2011] [Accepted: 07/01/2011] [Indexed: 11/26/2022] Open
Abstract
Many prokaryotic transcription factors home in on one or a few target sites in the presence of a huge number of nonspecific sites. Our analysis of λ-repressor in the Escherichia coli genome based on single basepair substitution experiments shows the presence of hundreds of sites having binding energy within 3 Kcal/mole of the O(R)1 binding energy, and thousands of sites with binding energy above the nonspecific binding energy. The effect of such sites on DNA-based processes has not been fully explored. The presence of such sites dramatically lowers the occupation probability of the specific site far more than if the genome were composed of nonspecific sites only. Our Brownian dynamics studies show that the presence of quasi-specific sites results in very significant kinetic effects as well. In contrast to λ-repressor, the E. coli genome has orders of magnitude lower quasi-specific sites for GalR, an integral transcription factor, thus causing little competition for the specific site. We propose that GalR and perhaps repressors of the same family have evolved binding modes that lead to much smaller numbers of quasi-specific sites to remove the untoward effects of genomic DNA.
Collapse
Affiliation(s)
- J Chakrabarti
- Department of Chemical, Biological and Macromolecular Sciences, S. N. Bose National Centre for Basic Sciences, CSIR-Indian Institute of Chemical Biology, Kolkata, India.
| | | | | | | |
Collapse
|
28
|
Kim JT, Gewehr JE, Martinetz T. BINDING MATRIX: A NOVEL APPROACH FOR BINDING SITE RECOGNITION. J Bioinform Comput Biol 2011; 2:289-307. [PMID: 15297983 DOI: 10.1142/s0219720004000569] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Revised: 12/19/2003] [Accepted: 12/19/2003] [Indexed: 11/18/2022]
Abstract
Recognition of protein-DNA binding sites in genomic sequences is a crucial step for discovering biological functions of genomic sequences. Explosive growth in availability of sequence information has resulted in a demand for binding site detection methods with high specificity. The motivation of the work presented here is to address this demand by a systematic approach based on Maximum Likelihood Estimation. A general framework is developed in which a large class of binding site detection methods can be described in a uniform and consistent way. Protein-DNA binding is determined by binding energy, which is an approximately linear function within the space of sequence words. All matrix based binding word detectors can be regarded as different linear classifiers which attempt to estimate the linear separation implied by the binding energy function. The standard approaches of consensus sequences and profile matrices are described using this framework. A maximum likelihood approach for determining this linear separation leads to a novel matrix type, called the binding matrix. The binding matrix is the most specific matrix based classifier which is consistent with the input set of known binding words. It achieves significant improvements in specificity compared to other matrices. This is demonstrated using 95 sets of experimentally determined binding words provided by the TRANSFAC database.
Collapse
|
29
|
Zhu XM, Yin L, Hood L, Ao P. ROBUSTNESS, STABILITY AND EFFICIENCY OF PHAGE λ GENETIC SWITCH: DYNAMICAL STRUCTURE ANALYSIS. J Bioinform Comput Biol 2011; 2:785-817. [PMID: 15617166 DOI: 10.1142/s0219720004000946] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2004] [Revised: 09/16/2004] [Accepted: 09/18/2004] [Indexed: 11/18/2022]
Abstract
Based on the dynamical structure theory for complex networks recently developed by one of us and on the physical-chemical models for gene regulation, developed by Shea and Ackers in the 1980's, we formulate a direct and concise mathematical framework for the genetic switch controlling phage λ life cycles, which naturally includes the stochastic effect. The dynamical structure theory states that the dynamics of a complex network is determined by its four elementary components: The dissipation (analogous to degradation), the stochastic force, the driving force determined by a potential, and the transverse force. The potential may be interpreted as a landscape for the phage development in terms of attractive basins, saddle points, peaks and valleys. The dissipation gives rise to the adaptivity of the phage in the landscape defined by the potential: The phage always has the tendency to approach the bottom of the nearby attractive basin. The transverse force tends to keep the network on the equal-potential contour of the landscape. The stochastic fluctuation gives the phage the ability to search around the potential landscape by passing through saddle points.With molecular parameters in our model fixed primarily by the experimental data on wild-type phage and supplemented by data on one mutant, our calculated results on mutants agree quantitatively with the available experimental observations on other mutants for protein number, lysogenization frequency, and a lysis frequency in lysogen culture. The calculation reproduces the observed robustness of the phage λ genetic switch. This is the first mathematical description that successfully represents such a wide variety of major experimental phenomena. Specifically, we find: (1) The explanation for both the stability and the efficiency of phage λ genetic switch is the exponential dependence of saddle point crossing rate on potential barrier height, a result of the stochastic motion in a landscape; and (2) The positive feedback of cI repressor gene transcription, enhanced by the CI dimer cooperative binding, is the key to the robustness of the phage λ genetic switch against mutations and fluctuations in kinetic parameter values.
Collapse
Affiliation(s)
- X-M Zhu
- GenMath, Corp. 5525 27th Ave.N.E., Seattle, WA 98105, USA
| | | | | | | |
Collapse
|
30
|
Ahlstrom LS, Miyashita O. Molecular simulation uncovers the conformational space of the λ Cro dimer in solution. Biophys J 2011; 101:2516-24. [PMID: 22098751 DOI: 10.1016/j.bpj.2011.10.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 09/29/2011] [Accepted: 10/11/2011] [Indexed: 01/25/2023] Open
Abstract
The significant variation among solved structures of the λ Cro dimer suggests its flexibility. However, contacts in the crystal lattice could have stabilized a conformation which is unrepresentative of its dominant solution form. Here we report on the conformational space of the Cro dimer in solution using replica exchange molecular dynamics in explicit solvent. The simulated ensemble shows remarkable correlation with available x-ray structures. Network analysis and a free energy surface reveal the predominance of closed and semi-open dimers, with a modest barrier separating these two states. The fully open conformation lies higher in free energy, indicating that it requires stabilization by DNA or crystal contacts. Most NMR models are found to be unstable conformations in solution. Intersubunit salt bridging between Arg(4) and Glu(53) during simulation stabilizes closed conformations. Because a semi-open state is among the low-energy conformations sampled in simulation, we propose that Cro-DNA binding may not entail a large conformational change relative to the dominant dimer forms in solution.
Collapse
Affiliation(s)
- Logan S Ahlstrom
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, USA
| | | |
Collapse
|
31
|
Hall BM, Vaughn EE, Begaye AR, Cordes MHJ. Reengineering Cro protein functional specificity with an evolutionary code. J Mol Biol 2011; 413:914-28. [PMID: 21945527 DOI: 10.1016/j.jmb.2011.08.056] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 08/13/2011] [Accepted: 08/29/2011] [Indexed: 11/17/2022]
Abstract
Cro proteins from different lambdoid bacteriophages are extremely variable in their target consensus DNA sequences and constitute an excellent model for evolution of transcription factor specificity. We experimentally tested a bioinformatically derived evolutionary code relating switches between pairs of amino acids at three recognition helix sites in Cro proteins to switches between pairs of nucleotide bases in the cognate consensus DNA half-sites. We generated all eight possible code variants of bacteriophage λ Cro and used electrophoretic mobility shift assays to compare binding of each variant to its own putative cognate site and to the wild-type cognate site; we also tested the wild-type protein against all eight DNA sites. Each code variant showed stronger binding to its putative cognate site than to the wild-type site, except some variants containing proline at position 27; each also bound its cognate site better than wild-type Cro bound the same site. Most code variants, however, displayed poorer affinity and specificity than wild-type λ Cro. Fluorescence anisotropy assays on λ Cro and the triple code variant (PSQ) against the two cognate sites confirmed the switch in specificity and showed larger apparent effects on binding affinity and specificity. Bacterial one-hybrid assays of λ Cro and PSQ against libraries of sequences with a single randomized half-site showed the expected switches in specificity at two of three coded positions and no clear switches in specificity at noncoded positions. With a few caveats, these results confirm that the proposed Cro evolutionary code can be used to reengineer Cro specificity.
Collapse
Affiliation(s)
- Branwen M Hall
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721, USA
| | | | | | | |
Collapse
|
32
|
Multilevel autoregulation of λ repressor protein CI by DNA looping in vitro. Proc Natl Acad Sci U S A 2011; 108:14807-12. [PMID: 21873207 DOI: 10.1073/pnas.1111221108] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The prophage state of bacteriophage λ is extremely stable and is maintained by a highly regulated level of λ repressor protein, CI, which represses lytic functions. CI regulates its own synthesis in a lysogen by activating and repressing its promoter, P(RM). CI participates in long-range interactions involving two regions of widely separated operator sites by generating a loop in the intervening DNA. We investigated the roles of each individual site under conditions that permitted DNA loop formation by using in vitro transcription assays for the first time on supercoiled DNA that mimics in vivo situation. We confirmed that DNA loops generated by oligomerization of CI bound to its operators influence the autoactivation and autorepression of P(RM) regulation. We additionally report that different configurations of DNA loops are central to this regulation--one configuration further enhances autoactivation and another is essential for autorepression of P(RM).
Collapse
|
33
|
Abstract
How do complex gene regulatory circuits evolve? These circuits involve many interacting components, which work together to specify patterns of gene expression. They typically include many subtle mechanistic features, but in most cases it is unclear whether these features are essential for the circuit to work at all, or if instead they make a functional circuit work better. In the latter case, such a feature is here termed 'dispensable', and it is plausible that the feature has been added at a late stage in the evolution of the circuit. This review describes experimental tests of this question, using the phage λ gene regulatory circuit. Several features of this circuit are found to be dispensable, in the sense that the circuitry works without these features, though not as well as the wild type. In some cases, second-site suppressor mutations are needed to confer near-normal behavior in the absence of such a feature. These findings are discussed here in the context of a two-stage model for evolution of gene regulatory circuits. In this model, a circuit evolves by assembly of a primitive or basic form, followed by adjustment of parameters and addition of qualitatively new features. Pathways are suggested for the addition of such features to a more basic form. Selected examples in other systems are described. Some of the dispensable features of phage λ may be evolutionary refinements. Finding that a feature is dispensable, however, does not prove that it is a late addition - it is possible that it was essential early in evolution, and became dispensable as the circuit evolved. Conversely, a late addition might have become essential. As ongoing work provides additional examples of dispensable features, it may become clearer how often they represent refinements.
Collapse
|
34
|
Abstract
Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a high binding affinity for a given molecular target. The highest affinity binding sequences isolated through SELEX can have numerous research, diagnostic, and therapeutic applications. Recently, important new modifications of the SELEX protocol have been proposed. In particular, a suitably modified SELEX experiment, together with an appropriate computational procedure, allows inference of protein-DNA interaction parameters with up to now unprecedented accuracy. Such inference is possible even when there is no a priori information on transcription factor binding specificity, which allows accurate predictions of binding sites for any transcription factor of interest. In this chapter we discuss how to accurately determine protein-DNA interaction parameters from SELEX experiments. The chapter addresses experimental and computational procedure needed to generate and analyze appropriate data.
Collapse
|
35
|
Little JW, Michalowski CB. Stability and instability in the lysogenic state of phage lambda. J Bacteriol 2010; 192:6064-76. [PMID: 20870769 PMCID: PMC2976446 DOI: 10.1128/jb.00726-10] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 09/11/2010] [Indexed: 12/26/2022] Open
Abstract
Complex gene regulatory circuits exhibit emergent properties that are difficult to predict from the behavior of the components. One such property is the stability of regulatory states. Here we analyze the stability of the lysogenic state of phage λ. In this state, the virus maintains a stable association with the host, and the lytic functions of the virus are repressed by the viral CI repressor. This state readily switches to the lytic pathway when the host SOS system is induced. A low level of SOS-dependent switching occurs without an overt stimulus. We found that the intrinsic rate of switching to the lytic pathway, measured in a host lacking the SOS response, was almost undetectably low, probably less than 10(-8)/generation. We surmise that this low rate has not been selected directly during evolution but results from optimizing the rate of switching in a wild-type host over the natural range of SOS-inducing conditions. We also analyzed a mutant, λprm240, in which the promoter controlling CI expression was weakened, rendering lysogens unstable. Strikingly, the intrinsic stability of λprm240 lysogens depended markedly on the growth conditions; lysogens grown in minimal medium were nearly stable but switched at high rates when grown in rich medium. These effects on stability likely reflect corresponding effects on the strength of the prm240 promoter, measured in an uncoupled assay system. Several derivatives of λprm240 with altered stabilities were characterized. This mutant and its derivatives afford a model system for further analysis of stability.
Collapse
Affiliation(s)
- John W Little
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA.
| | | |
Collapse
|
36
|
Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci U S A 2010; 107:9158-63. [PMID: 20439748 DOI: 10.1073/pnas.1004290107] [Citation(s) in RCA: 227] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Cells use protein-DNA and protein-protein interactions to regulate transcription. A biophysical understanding of this process has, however, been limited by the lack of methods for quantitatively characterizing the interactions that occur at specific promoters and enhancers in living cells. Here we show how such biophysical information can be revealed by a simple experiment in which a library of partially mutated regulatory sequences are partitioned according to their in vivo transcriptional activities and then sequenced en masse. Computational analysis of the sequence data produced by this experiment can provide precise quantitative information about how the regulatory proteins at a specific arrangement of binding sites work together to regulate transcription. This ability to reliably extract precise information about regulatory biophysics in the face of experimental noise is made possible by a recently identified relationship between likelihood and mutual information. Applying our experimental and computational techniques to the Escherichia coli lac promoter, we demonstrate the ability to identify regulatory protein binding sites de novo, determine the sequence-dependent binding energy of the proteins that bind these sites, and, importantly, measure the in vivo interaction energy between RNA polymerase and a DNA-bound transcription factor. Our approach provides a generally applicable method for characterizing the biophysical basis of transcriptional regulation by a specified regulatory sequence. The principles of our method can also be applied to a wide range of other problems in molecular biology.
Collapse
|
37
|
Contreras-Moreira B, Sancho J, Angarica VE. Comparison of DNA binding across protein superfamilies. Proteins 2010; 78:52-62. [PMID: 19731374 DOI: 10.1002/prot.22525] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Specific protein-DNA interactions are central to a wide group of processes in the cell and have been studied both experimentally and computationally over the years. Despite the increasing collection of protein-DNA complexes, so far only a few studies have aimed at dissecting the structural characteristics of DNA binding among evolutionarily related proteins. Some questions that remain to be answered are: (a) what is the contribution of the different readout mechanisms in members of a given structural superfamily, (b) what is the degree of interface similarity among superfamily members and how this affects binding specificity, (c) how DNA-binding protein superfamilies distribute across taxa, and (d) is there a general or family-specific code for the recognition of DNA. We have recently developed a straightforward method to dissect the interface of protein-DNA complexes at the atomic level and here we apply it to study 175 proteins belonging to nine representative superfamilies. Our results indicate that evolutionarily unrelated DNA-binding domains broadly conserve specificity statistics, such as the ratio of indirect/direct readout and the frequency of atomic interactions, therefore supporting the existence of a set of recognition rules. It is also found that interface conservation follows trends that are superfamily-specific. Finally, this article identifies tendencies in the phylogenetic distribution of transcription factors, which might be related to the evolution of regulatory networks, and postulates that the modular nature of zinc finger proteins can explain its role in large genomes, as it allows for larger binding interfaces in a single protein molecule.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Av. Montañana 1.005, Zaragoza, Spain.
| | | | | |
Collapse
|
38
|
Greenspan NS. Cohen's Conjecture, Howard's Hypothesis, and Ptashne's Ptruth: an exploration of the relationship between affinity and specificity. Trends Immunol 2010; 31:138-43. [PMID: 20149744 DOI: 10.1016/j.it.2010.01.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Revised: 01/04/2010] [Accepted: 01/05/2010] [Indexed: 11/29/2022]
Abstract
Both affinity and specificity for ligands directly influence the functions of biological macromolecules. Some investigators assume that there is a consistent relationship between the affinity of a receptor molecule for its cognate ligand(s) and the specificity of that same receptor (affinity for cognate versus non-cognate ligands). However, analysis of the range of physical factors that account for changes in affinity, in any particular direction and to any particular degree, of a receptor for a cognate ligand suggests strongly that such factors can have disparate effects on the affinities of the receptor for different non-cognate ligands. Therefore, there can be no simple relationship between affinity and specificity as defined by relative binding of the receptor to cognate and non-cognate ligands.
Collapse
Affiliation(s)
- Neil S Greenspan
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA.
| |
Collapse
|
39
|
Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, sigmaE. Proc Natl Acad Sci U S A 2010; 107:2854-9. [PMID: 20133665 DOI: 10.1073/pnas.0915066107] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Sequenced bacterial genomes provide a wealth of information but little understanding of transcriptional regulatory circuits largely because accurate prediction of promoters is difficult. We examined two important issues for accurate promoter prediction: (1) the ability to predict promoter strength and (2) the sequence properties that distinguish between active and weak/inactive promoters. We addressed promoter prediction using natural core promoters recognized by the well-studied alternative sigma factor, Escherichia coli sigma(E), as a representative of group 4 sigmas, the largest sigma group. To evaluate the contribution of sequence to promoter strength and function, we used modular position weight matrix models comprised of each promoter motif and a penalty score for suboptimal motif location. We find that a combination of select modules is moderately predictive of promoter strength and that imposing minimal motif scores distinguished active from weak/inactive promoters. The combined -35/-10 score is the most important predictor of activity. Our models also identified key sequence features associated with active promoters. A conserved "AAC" motif in the -35 region is likely to be a general predictor of function for promoters recognized by group 4 sigmas. These results provide valuable insights into sequences that govern promoter strength, distinguish active and inactive promoters for the first time, and are applicable to both in vivo and in vitro measures of promoter strength.
Collapse
|
40
|
Zhao Y, Granas D, Stormo GD. Inferring binding energies from selected binding sites. PLoS Comput Biol 2009; 5:e1000590. [PMID: 19997485 PMCID: PMC2777355 DOI: 10.1371/journal.pcbi.1000590] [Citation(s) in RCA: 165] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 11/02/2009] [Indexed: 11/18/2022] Open
Abstract
We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms. The DNA binding sites of transcription factors that control gene expression are often predicted based on a collection of known or selected binding sites. The most commonly used methods for inferring the binding site pattern, or sequence motif, assume that the sites are selected in proportion to their affinity for the transcription factor, ignoring the effect of the transcription factor concentration. We have developed a new maximum likelihood approach, in a program called BEEML, that directly takes into account the transcription factor concentration as well as non-specific contributions to the binding affinity, and we show in simulation studies that it gives a much more accurate model of the transcription factor binding sites than previous methods. We also develop a new method for extracting binding sites for a transcription factor from a random pool of DNA sequences, called high-throughput SELEX (HT-SELEX), and we show that after a single round of selection BEEML can obtain an accurate model of the transcription factor binding sites.
Collapse
Affiliation(s)
- Yue Zhao
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - David Granas
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
41
|
Refardt D, Rainey PB. Tuning a genetic switch: experimental evolution and natural variation of prophage induction. Evolution 2009; 64:1086-97. [PMID: 19891623 DOI: 10.1111/j.1558-5646.2009.00882.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Genetic switches allow organisms to modulate their phenotype in response to environmental changes. Understanding the evolutionary processes by which switches are tuned is central to understanding how phenotypic variation is realized. Prophage induction by phage lambda is the classic example of a genetic switch and allows lambda to move between two different modes of transmission: as a lysogen it reproduces vertically as a component of the host genome; as a free phage it reproduces horizontally by infectious epidemic spread. We show that the lambda switch can respond rapidly to selection for alteration in sensitivity and threshold. Sequencing of candidate genes in the genetic circuitry underlying the switch revealed mutations of likely adaptive significance in some, but not all candidates, suggesting that the core genetic circuitry plays a limited role in the fine-tuning of the switch in vivo. The relative ease with which the switch could be tuned by selection was further indicated by extensive variation in sensitivity and threshold of its response function among wild lambdoid phages. Together, our findings emphasize the adaptive significance of a finely tuned switch and draw attention to the selective factors shaping prophage induction in natural phage populations.
Collapse
Affiliation(s)
- Dominik Refardt
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand.
| | | |
Collapse
|
42
|
Abstract
We present a comprehensive, computational study of the properties of bacteriophage lambda mutants designed by Atsumi and Little (2006 Proc. Natl. Acad. Sci. 103 4558-63). These phages underwent a genetic reconstruction where Cro was replaced by a dimeric form of the Lac repressor. To clarify the theoretical characteristics of these mutants, we built a detailed thermodynamic model. The mutants all have a different genetic wiring than the wild-type lambda. One group lacks regulation of P(RM) by the lytic protein. These mutants only exhibit the lysogenic equilibrium, with no transiently active P(R). The other group lacks the negative feedback from CI. In this group, we identify a handful of bi-stable mutants, although the majority only exhibit the lysogenic equilibrium. The experimental identification of functional phages differs from our predictions. From a theoretical perspective, there is no reason why only 4 out of 900 mutants should be functional. The differences between theory and experiment can be explained in two ways. Either, the view of the lambda phage as a bi-stable system needs to be revised, or the mutants have in fact not undergone a modular replacement, as intended by Atsumi and Little, but constitute instead a wider systemic change.
Collapse
Affiliation(s)
- Maria Werner
- Department of Computational Biology, KTH-Royal Institute of Technology, Albanova University Center, SE-10691 Stockholm, Sweden.
| | | |
Collapse
|
43
|
Homsi DSF, Gupta V, Stormo GD. Modeling the quantitative specificity of DNA-binding proteins from example binding sites. PLoS One 2009; 4:e6736. [PMID: 19707584 PMCID: PMC2726951 DOI: 10.1371/journal.pone.0006736] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 07/07/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The binding of transcription factors to their respective DNA sites is a key component of every regulatory network. Predictions of transcription factor binding sites are usually based on models for transcription factor specificity. These models, in turn, are often based on examples of known binding sites. METHODOLOGY/PRINCIPAL FINDINGS Collections of binding sites are obtained in simulation experiments where the true model for the transcription factor is known and various sampling procedures are employed. We compare the accuracies of three different and commonly used methods for predicting the specificity of the transcription factor based on example binding sites. Different methods for constructing the models can lead to significant differences in the accuracy of the predictions and we show that commonly used methods can be positively misleading, even at large sample sizes and using noise-free data. Methods that minimize the number of predicted binding sequences are often significantly more accurate than the other methods tested. CONCLUSIONS/SIGNIFICANCE Different methods for generating motifs from example binding sites can have significantly different numbers of false positive and false negative predictions. For many different sampling procedures models based on quadratic programming are the most accurate.
Collapse
Affiliation(s)
- Dana S. F. Homsi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Vineet Gupta
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Gary D. Stormo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| |
Collapse
|
44
|
Monajjemi M, Razavian MH, Mollaamin F, Naderi F, Honarparvar B. A theoretical thermochemical study of solute-solvent dielectric effects in the displacement of codon-anticodon base pairs. RUSSIAN JOURNAL OF PHYSICAL CHEMISTRY A 2008; 82:2277-2285. [DOI: 10.1134/s0036024408130207] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
45
|
Angarica VE, Pérez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics 2008; 9:436. [PMID: 18922190 PMCID: PMC2585596 DOI: 10.1186/1471-2105-9-436] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 10/16/2008] [Indexed: 11/10/2022] Open
Abstract
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Collapse
Affiliation(s)
- Vladimir Espinosa Angarica
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, España.
| | | | | | | | | |
Collapse
|
46
|
Omagari K, Yoshimura H, Suzuki T, Takano M, Ohmori M, Sarai A. ΔG-based prediction and experimental confirmation of SYCRP1-binding sites on the Synechocystis genome. FEBS J 2008; 275:4786-95. [DOI: 10.1111/j.1742-4658.2008.06618.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
47
|
Hall BM, Roberts SA, Heroux A, Cordes MHJ. Two structures of a lambda Cro variant highlight dimer flexibility but disfavor major dimer distortions upon specific binding of cognate DNA. J Mol Biol 2007; 375:802-11. [PMID: 18054042 DOI: 10.1016/j.jmb.2007.10.082] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Revised: 10/29/2007] [Accepted: 10/30/2007] [Indexed: 10/22/2022]
Abstract
Previously reported crystal structures of free and DNA-bound dimers of lambda Cro differ strongly (about 4 A backbone rmsd), suggesting both flexibility of the dimer interface and induced-fit protein structure changes caused by sequence-specific DNA binding. Here, we present two crystal structures, in space groups P3(2)21 and C2 at 1.35 and 1.40 A resolution, respectively, of a variant of lambda Cro with three mutations in its recognition helix (Q27P/A29S/K32Q, or PSQ for short). One dimer structure (P3(2)21; PSQ form 1) resembles the DNA-bound wild-type Cro dimer (1.0 A backbone rmsd), while the other (C2; PSQ form 2) resembles neither unbound (3.6 A) nor bound (2.4 A) wild-type Cro. Both PSQ form 2 and unbound wild-type dimer crystals have a similar interdimer beta-sheet interaction between the beta1 strands at the edges of the dimer. In the former, an infinite, open beta-structure along one crystal axis results, while in the latter, a closed tetrameric barrel is formed. Neither the DNA-bound wild-type structure nor PSQ form 1 contains these interdimer interactions. We propose that beta-sheet superstructures resulting from crystal contact interactions distort Cro dimers from their preferred solution conformation, which actually resembles the DNA-bound structure. These results highlight the remarkable flexibility of lambda Cro but also suggest that sequence-specific DNA binding may not induce large changes in the protein structure.
Collapse
Affiliation(s)
- Branwen M Hall
- Department of Biochemistry and Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA
| | | | | | | |
Collapse
|
48
|
Abstract
DNA-protein interactions are fundamental to many biological processes, including the regulation of gene expression. Determining the binding affinities of transcription factors (TFs) to different DNA sequences allows the quantitative modeling of transcriptional regulatory networks and has been a significant technical challenge in molecular biology for many years. A recent paper by Maerkl and Quake1 demonstrated the use of microfluidic technology for the analysis of DNA-protein interactions. An array of short DNA sequences was spotted onto a glass slide, which was then covered with a microfluidic device allowing each spot to be within a chamber into which the flow of materials was controlled by valves. By trapping the DNA-protein complexes on the surface and measuring their concentrations microscopically, they could determine the binding affinity to a large number of DNA sequences that were varied systematically. They studied four TFs from the basic helix-loop-helix family of proteins, all of which bind to E-box sites with the consensus CAnnTG (where "n" can be any base), and showed that variations in affinity for different sites allows each TF to regulate different genes.
Collapse
Affiliation(s)
- Gary D Stormo
- Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA.
| | | |
Collapse
|
49
|
Schubert RA, Dodd IB, Egan JB, Shearwin KE. Cro's role in the CI Cro bistable switch is critical for {lambda}'s transition from lysogeny to lytic development. Genes Dev 2007; 21:2461-72. [PMID: 17908932 PMCID: PMC1993876 DOI: 10.1101/gad.1584907] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 08/07/2007] [Indexed: 11/24/2022]
Abstract
CI represses cro; Cro represses cI. This double negative feedback loop is the core of the classical CI-Cro epigenetic switch of bacteriophage lambda. Despite the classical status of this switch, the role in lambda development of Cro repression of the P(RM) promoter for CI has remained unclear. To address this, we created binding site mutations that strongly impaired Cro repression of P(RM) with only minimal effects on CI regulation of P(RM). These mutations had little impact on lambda development after infection but strongly inhibited the transition from lysogeny to the lytic pathway. We demonstrate that following inactivation of CI by ultraviolet treatment of lysogens, repression of P(RM) by Cro is needed to prevent synthesis of new CI that would otherwise significantly impede lytic development. Thus a bistable CI-Cro circuit reinforces the commitment to a developmental transition.
Collapse
Affiliation(s)
- Rachel A. Schubert
- Molecular and Biomedical Sciences (Biochemistry), University of Adelaide, Adelaide, SA 5005, Australia
| | - Ian B. Dodd
- Molecular and Biomedical Sciences (Biochemistry), University of Adelaide, Adelaide, SA 5005, Australia
| | - J. Barry Egan
- Molecular and Biomedical Sciences (Biochemistry), University of Adelaide, Adelaide, SA 5005, Australia
| | - Keith E. Shearwin
- Molecular and Biomedical Sciences (Biochemistry), University of Adelaide, Adelaide, SA 5005, Australia
| |
Collapse
|
50
|
Moroni E, Caselle M, Fogolari F. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes. BMC STRUCTURAL BIOLOGY 2007; 7:61. [PMID: 17900341 PMCID: PMC2194778 DOI: 10.1186/1472-6807-7-61] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 09/27/2007] [Indexed: 11/26/2022]
Abstract
Background Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield), a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion This study supports the conclusion that physics-based methods can offer a completely complementary methodology to sequence-based methods for the identification of DNA-binding protein target sequences.
Collapse
Affiliation(s)
- Elisabetta Moroni
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
- Dipartimento di Fisica G. Occhialini, Università di Milano-Bicocca and INFN, Piazza delle Scienze 3, 20156 Milano, Italy
| | - Michele Caselle
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
| | - Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|