1
|
Fuqua T, Wagner A. The latent cis-regulatory potential of mobile DNA in Escherichia coli. Nat Commun 2025; 16:4740. [PMID: 40399339 DOI: 10.1038/s41467-025-60023-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 05/08/2025] [Indexed: 05/23/2025] Open
Abstract
Transposable elements can alter gene regulation in their host genome, either when they integrate into a genome, or when they accrue mutations after integration. However, the extent to which transposons can alter gene expression, as well as the necessary mutational steps, are not well characterized. Here we study the gene regulatory potential of the prominent IS3 family of transposable elements in E.coli. We started with 10 sequences from the ends of 5 IS3 sequences, created 18,537 random mutations in them, and measured their promoter activity using a massively parallel reporter assay. All 10 sequences could evolve de-novo promoter activity from single point mutations. De-novo promoters mostly emerge from existing proto-promoter sequences when mutations create new -10 boxes downstream of preexisting -35 boxes. The ends of IS3s harbor ~1.5 times as many such proto-promoter sequences than the E. coli genome. We also estimate that at least 26% of the 706 characterized IS3s already encode promoters. Our study shows that transposable elements can have a high latent cis-regulatory potential. Our observations can help to explain why mobile DNA may persist in prokaryotic genomes. They also underline the potential use of transposable elements as a substrate for evolving new gene expression.
Collapse
Affiliation(s)
- Timothy Fuqua
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
- The Sante Fe Institute, Sante Fe, NM, USA.
| |
Collapse
|
2
|
Grah R, Guet CC, Tkačik G, Lagator M. Linking molecular mechanisms to their evolutionary consequences: a primer. Genetics 2025; 229:iyae191. [PMID: 39601269 PMCID: PMC11796464 DOI: 10.1093/genetics/iyae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 11/13/2024] [Indexed: 11/29/2024] Open
Abstract
A major obstacle to predictive understanding of evolution stems from the complexity of biological systems, which prevents detailed characterization of key evolutionary properties. Here, we highlight some of the major sources of complexity that arise when relating molecular mechanisms to their evolutionary consequences and ask whether accounting for every mechanistic detail is important to accurately predict evolutionary outcomes. To do this, we developed a mechanistic model of a bacterial promoter regulated by 2 proteins, allowing us to connect any promoter genotype to 6 phenotypes that capture the dynamics of gene expression following an environmental switch. Accounting for the mechanisms that govern how this system works enabled us to provide an in-depth picture of how regulated bacterial promoters might evolve. More importantly, we used the model to explore which factors that contribute to the complexity of this system are essential for understanding its evolution, and which can be simplified without information loss. We found that several key evolutionary properties-the distribution of phenotypic and fitness effects of mutations, the evolutionary trajectories during selection for regulation-can be accurately captured without accounting for all, or even most, parameters of the system. Our findings point to the need for a mechanistic approach to studying evolution, as it enables tackling biological complexity and in doing so improves the ability to predict evolutionary outcomes.
Collapse
Affiliation(s)
- Rok Grah
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Calin C Guet
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Gasper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg AT-3400, Austria
| | - Mato Lagator
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
3
|
Sokolowski TR, Gregor T, Bialek W, Tkačik G. Deriving a genetic regulatory network from an optimization principle. Proc Natl Acad Sci U S A 2025; 122:e2402925121. [PMID: 39752518 PMCID: PMC11725783 DOI: 10.1073/pnas.2402925121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 11/13/2024] [Indexed: 01/11/2025] Open
Abstract
Many biological systems operate near the physical limits to their performance, suggesting that aspects of their behavior and underlying mechanisms could be derived from optimization principles. However, such principles have often been applied only in simplified models. Here, we explore a detailed mechanistic model of the gap gene network in the Drosophila embryo, optimizing its 50+ parameters to maximize the information that gene expression levels provide about nuclear positions. This optimization is conducted under realistic constraints, such as limits on the number of available molecules. Remarkably, the optimal networks we derive closely match the architecture and spatial gene expression profiles observed in the real organism. Our framework quantifies the tradeoffs involved in maximizing functional performance and allows for the exploration of alternative network configurations, addressing the question of which features are necessary and which are contingent. Our results suggest that multiple solutions to the optimization problem might exist across closely related organisms, offering insights into the evolution of gene regulatory networks.
Collapse
Affiliation(s)
- Thomas R. Sokolowski
- Institute of Science and Technology Austria, KlosterneuburgAT-3400, Austria
- Frankfurt Institute for Advanced Studies, Frankfurt am MainDE-60438, Germany
| | - Thomas Gregor
- Joseph Henry Laboratory of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
- Department of Stem Cell and Developmental Biology, UMR3738, Institut Pasteur, ParisFR-75015, France
| | - William Bialek
- Joseph Henry Laboratory of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
- Center for Studies in Physics and Biology, Rockefeller University, New York, NY10065
| | - Gašper Tkačik
- Institute of Science and Technology Austria, KlosterneuburgAT-3400, Austria
| |
Collapse
|
4
|
Westmann CA, Goldbach L, Wagner A. The highly rugged yet navigable regulatory landscape of the bacterial transcription factor TetR. Nat Commun 2024; 15:10745. [PMID: 39737967 PMCID: PMC11686294 DOI: 10.1038/s41467-024-54723-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/19/2024] [Indexed: 01/01/2025] Open
Abstract
Transcription factor binding sites (TFBSs) are important sources of evolutionary innovations. Understanding how evolution navigates the sequence space of such sites can be achieved by mapping TFBS adaptive landscapes. In such a landscape, an individual location corresponds to a TFBS bound by a transcription factor. The elevation at that location corresponds to the strength of transcriptional regulation conveyed by the sequence. Here, we develop an in vivo massively parallel reporter assay to map the landscape of bacterial TFBSs. We apply this assay to the TetR repressor, for which few TFBSs are known. We quantify the strength of transcriptional repression for 17,765 TFBSs and show that the resulting landscape is highly rugged, with 2092 peaks. Only a few peaks convey stronger repression than the wild type. Non-additive (epistatic) interactions between mutations are frequent. Despite these hallmarks of ruggedness, most high peaks are evolutionarily accessible. They have large basins of attraction and are reached by around 20% of populations evolving on the landscape. Which high peak is reached during evolution is unpredictable and contingent on the mutational path taken. This in-depth analysis of a prokaryotic gene regulator reveals a landscape that is navigable but much more rugged than the landscapes of eukaryotic regulators.
Collapse
Affiliation(s)
- Cauã Antunes Westmann
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Leander Goldbach
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, 87501, USA.
| |
Collapse
|
5
|
Karshenas A, Röschinger T, Garcia HG. Predictive Modeling of Gene Expression and Localization of DNA Binding Site Using Deep Convolutional Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.629042. [PMID: 39763851 PMCID: PMC11702772 DOI: 10.1101/2024.12.17.629042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Despite the sequencing revolution, large swaths of the genomes sequenced to date lack any information about the arrangement of transcription factor binding sites on regulatory DNA. Massively Parallel Reporter Assays (MPRAs) have the potential to dramatically accelerate our genomic annotations by making it possible to measure the gene expression levels driven by thousands of mutational variants of a regulatory region. However, the interpretation of such data often assumes that each base pair in a regulatory sequence contributes independently to gene expression. To enable the analysis of this data in a manner that accounts for possible correlations between distant bases along a regulatory sequence, we developed the Deep learning Adaptable Regulatory Sequence Identifier (DARSI). This convolutional neural network leverages MPRA data to predict gene expression levels directly from raw regulatory DNA sequences. By harnessing this predictive capacity, DARSI systematically identifies transcription factor binding sites within regulatory regions at single-base pair resolution. To validate its predictions, we benchmarked DARSI against curated databases, confirming its accuracy in predicting transcription factor binding sites. Additionally, DARSI predicted novel unmapped binding sites, paving the way for future experimental efforts to confirm the existence of these binding sites and to identify the transcription factors that target those sites. Thus, by automating and improving the annotation of regulatory regions, DARSI generates experimentally actionable predictions that can feed iterations of the theory-experiment cycle aimed at reaching a predictive understanding of transcriptional control.
Collapse
Affiliation(s)
- Arman Karshenas
- Biophysics Graduate Group, University of California at Berkeley, Berkeley, CA, USA
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Hernan G. Garcia
- Biophysics Graduate Group, University of California at Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
- Department of Physics, University of California, Berkeley, CA, USA
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, CA, USA
- Chan Zuckerberg Biohub – San Francisco, San Francisco, CA, USA
| |
Collapse
|
6
|
Fuqua T, Sun Y, Wagner A. The emergence and evolution of gene expression in genome regions replete with regulatory motifs. eLife 2024; 13:RP98654. [PMID: 39704646 DOI: 10.7554/elife.98654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024] Open
Abstract
Gene regulation is essential for life and controlled by regulatory DNA. Mutations can modify the activity of regulatory DNA, and also create new regulatory DNA, a process called regulatory emergence. Non-regulatory and regulatory DNA contain motifs to which transcription factors may bind. In prokaryotes, gene expression requires a stretch of DNA called a promoter, which contains two motifs called -10 and -35 boxes. However, these motifs may occur in both promoters and non-promoter DNA in multiple copies. They have been implicated in some studies to improve promoter activity, and in others to repress it. Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence. To understand whether and how promoter motifs influence promoter emergence and evolution, we start from 50 'promoter islands', DNA sequences enriched with -10 and -35 boxes. We mutagenize these starting 'parent' sequences, and measure gene expression driven by 240,000 of the resulting mutants. We find that the probability that mutations create an active promoter varies more than 200-fold, and is not correlated with the number of promoter motifs. For parent sequences without promoter activity, mutations created over 1500 new -10 and -35 boxes at unique positions in the library, but only ~0.3% of these resulted in de-novo promoter activity. Only ~13% of all -10 and -35 boxes contribute to de-novo promoter activity. For parent sequences with promoter activity, mutations created new -10 and -35 boxes in 11 specific positions that partially overlap with preexisting ones to modulate expression. We also find that -10 and -35 boxes do not repress promoter activity. Overall, our work demonstrates how promoter motifs influence promoter emergence and evolution. It has implications for predicting and understanding regulatory evolution, de novo genes, and phenotypic evolution.
Collapse
Affiliation(s)
- Timothy Fuqua
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Yiqiao Sun
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, United States
| |
Collapse
|
7
|
Vilar JMG, Saiz L. The unreasonable effectiveness of equilibrium gene regulation through the cell cycle. Cell Syst 2024; 15:639-648.e2. [PMID: 38981487 DOI: 10.1016/j.cels.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 06/19/2023] [Accepted: 06/14/2024] [Indexed: 07/11/2024]
Abstract
Systems like the prototypical lac operon can reliably hold repression of transcription upon DNA replication across cell cycles with just 10 repressor molecules per cell and behave as if they were at equilibrium. The origin of this phenomenology is still an unresolved question. Here, we develop a general theory to analyze strong perturbations in quasi-equilibrium systems and use it to quantify the effects of DNA replication in gene regulation. We find a scaling law linking actual with predicted equilibrium transcription via a single kinetic parameter. We show that even the lac operon functions beyond the physical limits of naive regulation through compensatory mechanisms that suppress non-equilibrium effects. Synthetic systems without adjuvant activators, such as the cAMP receptor protein (CRP), lack this reliability. Our results provide a rationale for the function of CRP, beyond just being a tunable activator, as a mitigator of cell cycle perturbations.
Collapse
Affiliation(s)
- Jose M G Vilar
- Biofisika Institute (CSIC, UPV/EHU), University of the Basque Country (UPV/EHU), P.O. Box 644, 48080 Bilbao, Spain; IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain.
| | - Leonor Saiz
- Department of Biomedical Engineering, University of California, 451 E. Health Sciences Drive, Davis, CA 95616, USA; Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany.
| |
Collapse
|
8
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
9
|
Meger AT, Spence MA, Sandhu M, Matthews D, Chen J, Jackson CJ, Raman S. Rugged fitness landscapes minimize promiscuity in the evolution of transcriptional repressors. Cell Syst 2024; 15:374-387.e6. [PMID: 38537640 PMCID: PMC11299162 DOI: 10.1016/j.cels.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 09/08/2023] [Accepted: 03/05/2024] [Indexed: 04/20/2024]
Abstract
How a protein's function influences the shape of its fitness landscape, smooth or rugged, is a fundamental question in evolutionary biochemistry. Smooth landscapes arise when incremental mutational steps lead to a progressive change in function, as commonly seen in enzymes and binding proteins. On the other hand, rugged landscapes are poorly understood because of the inherent unpredictability of how sequence changes affect function. Here, we experimentally characterize the entire sequence phylogeny, comprising 1,158 extant and ancestral sequences, of the DNA-binding domain (DBD) of the LacI/GalR transcriptional repressor family. Our analysis revealed an extremely rugged landscape with rapid switching of specificity, even between adjacent nodes. Further, the ruggedness arises due to the necessity of the repressor to simultaneously evolve specificity for asymmetric operators and disfavors potentially adverse regulatory crosstalk. Our study provides fundamental insight into evolutionary, molecular, and biophysical rules of genetic regulation through the lens of fitness landscapes.
Collapse
Affiliation(s)
- Anthony T Meger
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Mahakaran Sandhu
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Dana Matthews
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Jackie Chen
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia.
| | - Srivatsan Raman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
10
|
Martinez GS, Perez-Rueda E, Kumar A, Dutt M, Maya CR, Ledesma-Dominguez L, Casa PL, Kumar A, de Avila e Silva S, Kelvin DJ. CDBProm: the Comprehensive Directory of Bacterial Promoters. NAR Genom Bioinform 2024; 6:lqae018. [PMID: 38385146 PMCID: PMC10880602 DOI: 10.1093/nargab/lqae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/29/2024] [Indexed: 02/23/2024] Open
Abstract
The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https://aw.iimas.unam.mx/cdbprom/.
Collapse
Affiliation(s)
- Gustavo Sganzerla Martinez
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autonóma de México, Unidad Académica del Estado de Yucatán, Mérida 97302, Yucatán, Mexico
| | - Anuj Kumar
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Mansi Dutt
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Cinthia Rodríguez Maya
- Facultad de Ciencias e Ingeniería, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Leonardo Ledesma-Dominguez
- Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Pedro Lenz Casa
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - Aditya Kumar
- Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Scheila de Avila e Silva
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - David J Kelvin
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| |
Collapse
|
11
|
Deal C, De Wannemaeker L, De Mey M. Towards a rational approach to promoter engineering: understanding the complexity of transcription initiation in prokaryotes. FEMS Microbiol Rev 2024; 48:fuae004. [PMID: 38383636 PMCID: PMC10911233 DOI: 10.1093/femsre/fuae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 01/29/2024] [Accepted: 02/20/2024] [Indexed: 02/23/2024] Open
Abstract
Promoter sequences are important genetic control elements. Through their interaction with RNA polymerase they determine transcription strength and specificity, thereby regulating the first step in gene expression. Consequently, they can be targeted as elements to control predictability and tuneability of a genetic circuit, which is essential in applications such as the development of robust microbial cell factories. This review considers the promoter elements implicated in the three stages of transcription initiation, detailing the complex interplay of sequence-specific interactions that are involved, and highlighting that DNA sequence features beyond the core promoter elements work in a combinatorial manner to determine transcriptional strength. In particular, we emphasize that, aside from promoter recognition, transcription initiation is also defined by the kinetics of open complex formation and promoter escape, which are also known to be highly sequence specific. Significantly, we focus on how insights into these interactions can be manipulated to lay the foundation for a more rational approach to promoter engineering.
Collapse
Affiliation(s)
- Cara Deal
- Centre for Synthetic Biology, Ghent University. Coupure Links 653, BE-9000 Ghent, Belgium
| | - Lien De Wannemaeker
- Centre for Synthetic Biology, Ghent University. Coupure Links 653, BE-9000 Ghent, Belgium
| | - Marjan De Mey
- Centre for Synthetic Biology, Ghent University. Coupure Links 653, BE-9000 Ghent, Belgium
| |
Collapse
|
12
|
Qi Q, Ghaly TM, Rajabal V, Gillings MR, Tetu SG. Dissecting molecular evolution of class 1 integron gene cassettes and identifying their bacterial hosts in suburban creeks via epicPCR. J Antimicrob Chemother 2024; 79:100-111. [PMID: 37962091 DOI: 10.1093/jac/dkad353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
OBJECTIVES Our study aimed to sequence class 1 integrons in uncultured environmental bacterial cells in freshwater from suburban creeks and uncover the taxonomy of their bacterial hosts. We also aimed to characterize integron gene cassettes with altered DNA sequences relative to those from databases or literature and identify key signatures of their molecular evolution. METHODS We applied a single-cell fusion PCR-based technique-emulsion, paired isolation and concatenation PCR (epicPCR)-to link class 1 integron gene cassette arrays to the phylogenetic markers of their bacterial hosts. The levels of streptomycin resistance conferred by the WT and altered aadA5 and aadA11 gene cassettes that encode aminoglycoside (3″) adenylyltransferases were experimentally quantified in an Escherichia coli host. RESULTS Class 1 integron gene cassette arrays were detected in Alphaproteobacteria and Gammaproteobacteria hosts. A subset of three gene cassettes displayed signatures of molecular evolution, namely the gain of a regulatory 5'-untranslated region (5'-UTR), the loss of attC recombination sites between adjacent gene cassettes, and the invasion of a 5'-UTR by an IS element. Notably, our experimental testing of a novel variant of the aadA11 gene cassette demonstrated that gaining the observed 5'-UTR contributed to a 3-fold increase in the MIC of streptomycin relative to the ancestral reference gene cassette in E. coli. CONCLUSIONS Dissecting the observed signatures of molecular evolution of class 1 integrons allowed us to explain their effects on antibiotic resistance phenotypes, while identifying their bacterial hosts enabled us to make better inferences on the likely origins of novel gene cassettes and IS that invade known gene cassettes.
Collapse
Affiliation(s)
- Qin Qi
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Timothy M Ghaly
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Vaheesan Rajabal
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Michael R Gillings
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Sasha G Tetu
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
13
|
Okay S. Fine-Tuning Gene Expression in Bacteria by Synthetic Promoters. Methods Mol Biol 2024; 2844:179-195. [PMID: 39068340 DOI: 10.1007/978-1-0716-4063-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Promoters are key genetic elements in the initiation and regulation of gene expression. A limited number of natural promoters has been described for the control of gene expression in synthetic biology applications. Therefore, synthetic promoters have been developed to fine-tune the transcription for the desired amount of gene product. Mostly, synthetic promoters are characterized using promoter libraries that are constructed via mutagenesis of promoter sequences. The strength of promoters in the library is determined according to the expression of a reporter gene such as gfp encoding green fluorescent protein. Gene expression can be controlled using inducers. The majority of the studies on gram-negative bacteria are conducted using the expression system of the model organism Escherichia coli while that of the model organism Bacillus subtilis is mostly used in the studies on gram-positive bacteria. Additionally, synthetic promoters for the cyanobacteria, which are phototrophic microorganisms, are evaluated, especially using the model cyanobacterium Synechocystis sp. PCC 6803. Moreover, a variety of algorithms based on machine learning methods were developed to characterize the features of promoter elements. Some of these in silico models were verified using in vitro or in vivo experiments. Identification of novel synthetic promoters with improved features compared to natural ones contributes much to the synthetic biology approaches in terms of fine-tuning gene expression.
Collapse
Affiliation(s)
- Sezer Okay
- Department of Vaccine Technology, Vaccine Institute, Hacettepe University, Ankara, Türkiye
| |
Collapse
|
14
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
15
|
Mani S, Tlusty T. Gene birth in a model of non-genic adaptation. BMC Biol 2023; 21:257. [PMID: 37957718 PMCID: PMC10644530 DOI: 10.1186/s12915-023-01745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 10/24/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.
Collapse
Affiliation(s)
- Somya Mani
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| |
Collapse
|
16
|
Xu H, Li C, Xu C, Zhang J. Chance promoter activities illuminate the origins of eukaryotic intergenic transcriptions. Nat Commun 2023; 14:1826. [PMID: 37005399 PMCID: PMC10067814 DOI: 10.1038/s41467-023-37610-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/23/2023] [Indexed: 04/04/2023] Open
Abstract
It is debated whether the pervasive intergenic transcription from eukaryotic genomes has functional significance or simply reflects the promiscuity of RNA polymerases. We approach this question by comparing chance promoter activities with the expression levels of intergenic regions in the model eukaryote Saccharomyces cerevisiae. We build a library of over 105 strains, each carrying a 120-nucleotide, chromosomally integrated, completely random sequence driving the potential transcription of a barcode. Quantifying the RNA concentration of each barcode in two environments reveals that 41-63% of random sequences have significant, albeit usually low, promoter activities. Therefore, even in eukaryotes, where the presence of chromatin is thought to repress transcription, chance transcription is prevalent. We find that only 1-5% of yeast intergenic transcriptions are unattributable to chance promoter activities or neighboring gene expressions, and these transcriptions exhibit higher-than-expected environment-specificity. These findings suggest that only a minute fraction of intergenic transcription is functional in yeast.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Chuan Li
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Microsoft, Redmond, WA, USA
| | - Chuan Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
17
|
The regulon of Brucella abortus two-component system BvrR/BvrS reveals the coordination of metabolic pathways required for intracellular life. PLoS One 2022; 17:e0274397. [PMID: 36129877 PMCID: PMC9491525 DOI: 10.1371/journal.pone.0274397] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 08/26/2022] [Indexed: 11/19/2022] Open
Abstract
Brucella abortus is a facultative intracellular pathogen causing a severe zoonotic disease worldwide. The two-component regulatory system (TCS) BvrR/BvrS of B. abortus is conserved in members of the Alphaproteobacteria class. It is related to the expression of genes required for host interaction and intracellular survival. Here we report that bvrR and bvrS are part of an operon composed of 16 genes encoding functions related to nitrogen metabolism, DNA repair and recombination, cell cycle arrest, and stress response. Synteny of this genomic region within close Alphaproteobacteria members suggests a conserved role in coordinating the expression of carbon and nitrogen metabolic pathways. In addition, we performed a ChIP-Seq analysis after exposure of bacteria to conditions that mimic the intracellular environment. Genes encoding enzymes at metabolic crossroads of the pentose phosphate shunt, gluconeogenesis, cell envelope homeostasis, nucleotide synthesis, cell division, and virulence are BvrR/BvrS direct targets. A 14 bp DNA BvrR binding motif was found and investigated in selected gene targets such as virB1, bvrR, pckA, omp25, and tamA. Understanding gene expression regulation is essential to elucidate how Brucella orchestrates a physiological response leading to a furtive pathogenic strategy.
Collapse
|
18
|
Abstract
Selection accumulates information in the genome-it guides stochastically evolving populations toward states (genotype frequencies) that would be unlikely under neutrality. This can be quantified as the Kullback-Leibler (KL) divergence between the actual distribution of genotype frequencies and the corresponding neutral distribution. First, we show that this population-level information sets an upper bound on the information at the level of genotype and phenotype, limiting how precisely they can be specified by selection. Next, we study how the accumulation and maintenance of information is limited by the cost of selection, measured as the genetic load or the relative fitness variance, both of which we connect to the control-theoretic KL cost of control. The information accumulation rate is upper bounded by the population size times the cost of selection. This bound is very general, and applies across models (Wright-Fisher, Moran, diffusion) and to arbitrary forms of selection, mutation, and recombination. Finally, the cost of maintaining information depends on how it is encoded: Specifying a single allele out of two is expensive, but one bit encoded among many weakly specified loci (as in a polygenic trait) is cheap.
Collapse
|
19
|
LaFleur TL, Hossain A, Salis HM. Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nat Commun 2022; 13:5159. [PMID: 36056029 PMCID: PMC9440211 DOI: 10.1038/s41467-022-32829-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 08/19/2022] [Indexed: 12/22/2022] Open
Abstract
Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. However, it remains unclear how non-canonical sequence motifs collectively control transcription rates. Here, we combine massively parallel assays, biophysics, and machine learning to develop a 346-parameter model that predicts site-specific transcription initiation rates for any σ70 promoter sequence, validated across 22132 bacterial promoters with diverse sequences. We apply the model to predict genetic context effects, design σ70 promoters with desired transcription rates, and identify undesired promoters inside engineered genetic systems. The model provides a biophysical basis for understanding gene regulation in natural genetic systems and precise transcriptional control for engineering synthetic genetic systems.
Collapse
Affiliation(s)
- Travis L LaFleur
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, 16801, USA
| | - Ayaan Hossain
- Bioinformatics and Genomics, Pennsylvania State University, University Park, PA, 16801, USA
| | - Howard M Salis
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, 16801, USA.
- Bioinformatics and Genomics, Pennsylvania State University, University Park, PA, 16801, USA.
- Department of Biological Engineering, Pennsylvania State University, University Park, PA, 16801, USA.
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA, 16801, USA.
| |
Collapse
|
20
|
Zoller B, Gregor T, Tkačik G. Eukaryotic gene regulation at equilibrium, or non? CURRENT OPINION IN SYSTEMS BIOLOGY 2022; 31:100435. [PMID: 36590072 PMCID: PMC9802646 DOI: 10.1016/j.coisb.2022.100435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Models of transcriptional regulation that assume equilibrium binding of transcription factors have been less successful at predicting gene expression from sequence in eukaryotes than in bacteria. This could be due to the non-equilibrium nature of eukaryotic regulation. Unfortunately, the space of possible non-equilibrium mechanisms is vast and predominantly uninteresting. The key question is therefore how this space can be navigated efficiently, to focus on mechanisms and models that are biologically relevant. In this review, we advocate for the normative role of theory-theory that prescribes rather than just describes-in providing such a focus. Theory should expand its remit beyond inferring mechanistic models from data, towards identifying non-equilibrium gene regulatory schemes that may have been evolutionarily selected, despite their energy consumption, because they are precise, reliable, fast, or otherwise outperform regulation at equilibrium. We illustrate our reasoning by toy examples for which we provide simulation code.
Collapse
Affiliation(s)
- Benjamin Zoller
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ, USA
- Department of Developmental and Stem Cell Biology UMR3738, Institut Pasteur, Paris, France
| | - Thomas Gregor
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ, USA
- Department of Developmental and Stem Cell Biology UMR3738, Institut Pasteur, Paris, France
| | - Gašper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| |
Collapse
|
21
|
Abstract
"De novo" genes evolve from previously non-genic DNA. This strikes many of us as remarkable, because it seems extraordinarily unlikely that random sequence would produce a functional gene. How is this possible? In this two-part review, I first summarize what is known about the origins and molecular functions of the small number of de novo genes for which such information is available. I then speculate on what these examples may tell us about how de novo genes manage to emerge despite what seem like enormous opposing odds.
Collapse
Affiliation(s)
- Caroline M Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
22
|
Tomanek I, Guet CC. Adaptation dynamics between copy-number and point mutations. eLife 2022; 11:82240. [PMID: 36546673 PMCID: PMC9833825 DOI: 10.7554/elife.82240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022] Open
Abstract
Together, copy-number and point mutations form the basis for most evolutionary novelty, through the process of gene duplication and divergence. While a plethora of genomic data reveals the long-term fate of diverging coding sequences and their cis-regulatory elements, little is known about the early dynamics around the duplication event itself. In microorganisms, selection for increased gene expression often drives the expansion of gene copy-number mutations, which serves as a crude adaptation, prior to divergence through refining point mutations. Using a simple synthetic genetic reporter system that can distinguish between copy-number and point mutations, we study their early and transient adaptive dynamics in real time in Escherichia coli. We find two qualitatively different routes of adaptation, depending on the level of functional improvement needed. In conditions of high gene expression demand, the two mutation types occur as a combination. However, under low gene expression demand, copy-number and point mutations are mutually exclusive; here, owing to their higher frequency, adaptation is dominated by copy-number mutations, in a process we term amplification hindrance. Ultimately, due to high reversal rates and pleiotropic cost, copy-number mutations may not only serve as a crude and transient adaptation, but also constrain sequence divergence over evolutionary time scales.
Collapse
Affiliation(s)
- Isabella Tomanek
- Institute of Science and Technology AustriaKlosterneuburgAustria
| | - Călin C Guet
- Institute of Science and Technology AustriaKlosterneuburgAustria
| |
Collapse
|