1
|
Karollus A, Hingerl J, Gankin D, Grosshauser M, Klemon K, Gagneur J. Species-aware DNA language models capture regulatory elements and their evolution. Genome Biol 2024; 25:83. [PMID: 38566111 PMCID: PMC10985990 DOI: 10.1186/s13059-024-03221-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.
Collapse
Affiliation(s)
- Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Johannes Hingerl
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Dennis Gankin
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Martin Grosshauser
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Kristian Klemon
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Center for Machine Learning, Munich, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
2
|
Nambiar A, Dubinkina V, Liu S, Maslov S. FUN-PROSE: A deep learning approach to predict condition-specific gene expression in fungi. PLoS Comput Biol 2023; 19:e1011563. [PMID: 37971967 PMCID: PMC10653424 DOI: 10.1371/journal.pcbi.1011563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 09/30/2023] [Indexed: 11/19/2023] Open
Abstract
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE-a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
Collapse
Affiliation(s)
- Ananthan Nambiar
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
| | - Veronika Dubinkina
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California, United States of America
| | - Simon Liu
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Sergei Maslov
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, Illinois, United States of America
| |
Collapse
|
3
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 124] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
4
|
Gatto V, Binati RL, Lemos Junior WJF, Basile A, Treu L, de Almeida OGG, Innocente G, Campanaro S, Torriani S. New insights into the variability of lactic acid production in Lachancea thermotolerans at the phenotypic and genomic level. Microbiol Res 2020; 238:126525. [PMID: 32593090 DOI: 10.1016/j.micres.2020.126525] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/03/2020] [Accepted: 06/05/2020] [Indexed: 01/13/2023]
Abstract
Non-conventional yeasts are increasingly applied in fermented beverage industry to obtain distinctive products with improved quality. Among these yeasts, Lachancea thermotolerans has multiple features of industrial relevance, especially the production of l(+)-lactic acid (LA), useful for the biological acidification of wine and beer. Since few information is available on this peculiar activity, the current study aimed to explore the physiological and genetic variability among L. thermotolerans strains. From a strain collection, mostly isolated from wine, a huge phenotypic diversity was acknowledged and allowed the selection of a high (SOL13) and a low (COLC27) LA producer. Comparative whole-genome sequencing of these two selected strains and the type strain CBS 6340T showed a high similarity in terms of gene content and functional annotation. Notwithstanding, target gene-based analysis revealed variations between high and low producers in the key gene sequences related to LA accumulation. More in-depth investigation of the core promoters and expression analysis of the genes ldh, encoding lactate dehydrogenase, indicated the transcriptional regulation may be the principal cause behind phenotypic differences. These findings highlighted the usefulness of whole-genome sequencing coupled with expression analysis. They provided crucial genetic insights for a deeper investigation of the intraspecific variability in LA production pathway.
Collapse
Affiliation(s)
- Veronica Gatto
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | - Renato L Binati
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | | | - Arianna Basile
- Department of Biology, University of Padua, 35121, Padua, Italy
| | - Laura Treu
- Department of Biology, University of Padua, 35121, Padua, Italy
| | - Otávio G G de Almeida
- Faculty of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, 14040-900, Ribeirão Preto, Brazil
| | - Giada Innocente
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | | | - Sandra Torriani
- Department of Biotechnology, University of Verona, 37134, Verona, Italy.
| |
Collapse
|
5
|
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2019; 38:56-65. [PMID: 31792407 PMCID: PMC6954276 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 161] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
6
|
Mulugeta TD, Nome T, To TH, Gundappa MK, Macqueen DJ, Våge DI, Sandve SR, Hvidsten TR. SalMotifDB: a tool for analyzing putative transcription factor binding sites in salmonid genomes. BMC Genomics 2019; 20:694. [PMID: 31477007 PMCID: PMC6720087 DOI: 10.1186/s12864-019-6051-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 08/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects. Results We present SalMotifDB, a database and associated web and R interface for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for 3072 non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases. Through motif matching and TF prediction, we have used these multi-species databases to construct putative regulatory networks in salmonid species. The utility of SalMotifDB is demonstrated by showing that key lipid metabolism regulators are predicted to regulate a set of genes affected by different lipid and fatty acid content in the feed, and by showing that our motif database explains a significant proportion of gene expression divergence in gene duplicates originating from the salmonid specific whole genome duplication. Conclusions SalMotifDB is an effective tool for analyzing transcription factors, their binding sites and the resulting gene regulatory networks in salmonid species, and will be an important tool for gaining a better mechanistic understanding of gene regulation and the associated phenotypes in salmonids. SalMotifDB is available at https://salmobase.org/apps/SalMotifDB. Electronic supplementary material The online version of this article (10.1186/s12864-019-6051-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Teshome Dagne Mulugeta
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torfinn Nome
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Thu-Hien To
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Manu Kumar Gundappa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Daniel J Macqueen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Dag Inge Våge
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Simen Rød Sandve
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway.
| |
Collapse
|
7
|
Sri Theivakadadcham VS, Bergey BG, Rosonina E. Sumoylation of DNA-bound transcription factor Sko1 prevents its association with nontarget promoters. PLoS Genet 2019; 15:e1007991. [PMID: 30763307 PMCID: PMC6392331 DOI: 10.1371/journal.pgen.1007991] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 02/27/2019] [Accepted: 01/28/2019] [Indexed: 12/30/2022] Open
Abstract
Sequence-specific transcription factors (TFs) represent one of the largest groups of proteins that is targeted for SUMO post-translational modification, in both yeast and humans. SUMO modification can have diverse effects, but recent studies showed that sumoylation reduces the interaction of multiple TFs with DNA in living cells. Whether this relates to a general role for sumoylation in TF binding site selection, however, has not been fully explored because few genome-wide studies aimed at studying such a role have been reported. To address this, we used genome-wide analysis to examine how sumoylation regulates Sko1, a yeast bZIP TF with hundreds of known binding sites. We find that Sko1 is sumoylated at Lys 567 and, although many of its targets are osmoresponse genes, the level of Sko1 sumoylation is not stress-regulated and the modification does not depend or impinge on its phosphorylation by the osmostress kinase Hog1. We show that Sko1 mutants that cannot bind DNA are not sumoylated, but attaching a heterologous DNA binding domain restores the modification, implicating DNA binding as a major determinant for Sko1 sumoylation. Genome-wide chromatin immunoprecipitation (ChIP-seq) analysis shows that a sumoylation-deficient Sko1 mutant displays increased occupancy levels at its numerous binding sites, which inhibits the recruitment of the Hog1 kinase to some induced osmostress genes. This strongly supports a general role for sumoylation in reducing the association of TFs with chromatin. Extending this result, remarkably, sumoylation-deficient Sko1 binds numerous additional promoters that are not normally regulated by Sko1 but contain sequences that resemble the Sko1 binding motif. Our study points to an important role for sumoylation in modulating the interaction of a DNA-bound TF with chromatin to increase the specificity of TF-DNA interactions.
Collapse
Affiliation(s)
| | | | - Emanuel Rosonina
- Department of Biology, York University, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
8
|
Knoll ER, Zhu ZI, Sarkar D, Landsman D, Morse RH. Role of the pre-initiation complex in Mediator recruitment and dynamics. eLife 2018; 7:39633. [PMID: 30540252 PMCID: PMC6322861 DOI: 10.7554/elife.39633] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 12/12/2018] [Indexed: 12/19/2022] Open
Abstract
The Mediator complex stimulates the cooperative assembly of a pre-initiation complex (PIC) and recruitment of RNA Polymerase II (Pol II) for gene activation. The core Mediator complex is organized into head, middle, and tail modules, and in budding yeast (Saccharomyces cerevisiae), Mediator recruitment has generally been ascribed to sequence-specific activators engaging the tail module triad of Med2-Med3-Med15 at upstream activating sequences (UASs). We show that yeast lacking Med2-Med3-Med15 are viable and that Mediator and PolII are recruited to promoters genome-wide in these cells, albeit at reduced levels. To test whether Mediator might alternatively be recruited via interactions with the PIC, we examined Mediator association genome-wide after depleting PIC components. We found that depletion of Taf1, Rpb3, and TBP profoundly affected Mediator association at active gene promoters, with TBP being critical for transit of Mediator from UAS to promoter, while Pol II and Taf1 stabilize Mediator association at proximal promoters.
Collapse
Affiliation(s)
- Elisabeth R Knoll
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, United States
| | - Z Iris Zhu
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, Bethesda, United States
| | - Debasish Sarkar
- Wadsworth Center, New York State Department of Health, Albany, United States
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, Bethesda, United States
| | - Randall H Morse
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, United States.,Wadsworth Center, New York State Department of Health, Albany, United States
| |
Collapse
|
9
|
Murarka P, Srivastava P. An improved method for the isolation and identification of unknown proteins that bind to known DNA sequences by affinity capture and mass spectrometry. PLoS One 2018; 13:e0202602. [PMID: 30138440 PMCID: PMC6107227 DOI: 10.1371/journal.pone.0202602] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Accepted: 08/05/2018] [Indexed: 12/13/2022] Open
Abstract
Transcription of a gene can be regulated at many different levels. One such fundamental level is interaction between protein and DNA. Protein(s) binds to distinct sites on the DNA, which activate, enhance or repress transcription. Despite being such an important process, very few tools exist to identify the proteins that interact with chromosome, most of which are in vitro in nature. Here, we propose an in vivo based method for identification of DNA binding protein(s) in bacteria where the DNA-protein complex formed in vivo is crosslinked by formaldehyde. This complex is further isolated and the bound proteins are identified. The method was used to isolate promoter DNA binding proteins, which bind and regulate the dsz operon in Gordonia sp. IITR 100 responsible for biodesulfurization of organosulfurs. The promoter binding proteins were identified by MALDI ToF MS/MS and the binding was confirmed by gel shift assay. Unlike other reported in vivo methods, this improved method does not require sequence of the whole genome or a chip and can be scaled up to improve yields.
Collapse
Affiliation(s)
- Pooja Murarka
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology, New Delhi, India
| | - Preeti Srivastava
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology, New Delhi, India
- * E-mail: ,
| |
Collapse
|
10
|
Zhou S, Sternglanz R, Neiman AM. Developmentally regulated internal transcription initiation during meiosis in budding yeast. PLoS One 2017; 12:e0188001. [PMID: 29136644 PMCID: PMC5685637 DOI: 10.1371/journal.pone.0188001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 10/30/2017] [Indexed: 02/07/2023] Open
Abstract
Sporulation of budding yeast is a developmental process in which cells undergo meiosis to generate stress-resistant progeny. The dynamic nature of the budding yeast meiotic transcriptome has been well established by a number of genome-wide studies. Here we develop an analysis pipeline to systematically identify novel transcription start sites that reside internal to a gene. Application of this pipeline to data from a synchronized meiotic time course reveals over 40 genes that display specific internal initiations in mid-sporulation. Consistent with the time of induction, motif analysis on upstream sequences of these internal transcription start sites reveals a significant enrichment for the binding site of Ndt80, the transcriptional activator of middle sporulation genes. Further examination of one gene, MRK1, demonstrates the Ndt80 binding site is necessary for internal initiation and results in the expression of an N-terminally truncated protein isoform. When the MRK1 paralog RIM11 is downregulated, the MRK1 internal transcript promotes efficient sporulation, indicating functional significance of the internal initiation. Our findings suggest internal transcriptional initiation to be a dynamic, regulated process with potential functional impacts on development.
Collapse
Affiliation(s)
- Sai Zhou
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, United States of America
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY, United States of America
| | - Rolf Sternglanz
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, United States of America
| | - Aaron M. Neiman
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, United States of America
- * E-mail:
| |
Collapse
|
11
|
Machens F, Balazadeh S, Mueller-Roeber B, Messerschmidt K. Synthetic Promoters and Transcription Factors for Heterologous Protein Expression in Saccharomyces cerevisiae. Front Bioeng Biotechnol 2017; 5:63. [PMID: 29098147 PMCID: PMC5653697 DOI: 10.3389/fbioe.2017.00063] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 09/29/2017] [Indexed: 12/19/2022] Open
Abstract
Orthogonal systems for heterologous protein expression as well as for the engineering of synthetic gene regulatory circuits in hosts like Saccharomyces cerevisiae depend on synthetic transcription factors (synTFs) and corresponding cis-regulatory binding sites. We have constructed and characterized a set of synTFs based on either transcription activator-like effectors or CRISPR/Cas9, and corresponding small synthetic promoters (synPs) with minimal sequence identity to the host’s endogenous promoters. The resulting collection of functional synTF/synP pairs confers very low background expression under uninduced conditions, while expression output upon induction of the various synTFs covers a wide range and reaches induction factors of up to 400. The broad spectrum of expression strengths that is achieved will be useful for various experimental setups, e.g., the transcriptional balancing of expression levels within heterologous pathways or the construction of artificial regulatory networks. Furthermore, our analyses reveal simple rules that enable the tuning of synTF expression output, thereby allowing easy modification of a given synTF/synP pair. This will make it easier for researchers to construct tailored transcriptional control systems.
Collapse
Affiliation(s)
- Fabian Machens
- University of Potsdam, Cell2Fab Research Unit, Potsdam, Germany
| | - Salma Balazadeh
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany.,Department Molecular Biology, University of Potsdam, Potsdam, Germany
| | - Bernd Mueller-Roeber
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany.,Department Molecular Biology, University of Potsdam, Potsdam, Germany
| | | |
Collapse
|
12
|
Du M, Bai L. 3D clustering of co-regulated genes and its effect on gene expression. Curr Genet 2017; 63:1017-1021. [PMID: 28551816 DOI: 10.1007/s00294-017-0712-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 05/22/2017] [Accepted: 05/24/2017] [Indexed: 01/29/2023]
Abstract
There are extensive long-distance chromosomal interactions in eukaryotic genomes, but to what extent these interactions affect gene expression is not clear. Recent works have identified several cases where clustering of co-regulated genes leads to enhanced gene expression in budding yeast. Similar phenomenon was also observed in mammalian cells. These results challenge widely held views of gene regulation in yeast and further our understanding of how the 3D organization of the genome contribute to gene regulation in eukaryotes.
Collapse
Affiliation(s)
- Manyu Du
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, State College, PA, USA.,Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, State College, PA, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, State College, PA, USA. .,Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, State College, PA, USA. .,Department of Physics, The Pennsylvania State University, University Park, State College, PA, USA.
| |
Collapse
|
13
|
Sun Z, Li Z, Huang J, Zheng B, Zhang L, Wang Z. Genome-wide comparative analysis of LEAFY promoter sequence in angiosperms. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2017; 23:23-33. [PMID: 28250581 PMCID: PMC5313397 DOI: 10.1007/s12298-016-0393-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 11/07/2016] [Accepted: 11/18/2016] [Indexed: 05/11/2023]
Abstract
Regulation of the flowering mechanism is influenced by many environmental factors. Dissecting the regulatory processes upstream of the LFY (LEAFY) gene will help us to understand the molecular mechanisms of floral induction. In total, 53 LFY sequences were identified in 37 species. Among the 53 selected LFY promoters and after eliminating the short sequences, 47 LFY promoters were analyzed. Comparative genome studies for LFY promoters among plants showed that TATA-box existed in all herbaceous plants. The 1345-bp promoter sequence upstream to hickory LFY gene was cloned and analyzed, together with functional studies. The result of sequence alignment showed that the region of the hickory LFY promoter has only two conserved auxin response elements (AuxRE), whereas other plants had four. The positions of AuxRE in hickory and walnut were the same, but they were different from the positions from other plants. Furthermore the sequence analysis showed that the promoter have TATA-box and CAAT-box motifs. Deletion analysis of these motifs did not block β-glucuronidase (GUS) activity during the transient expression assay, suggesting that it may be a TATA-less promoter. Low temperature and light significantly induced the full-length promoter to increase about two folds of the GUS enzymatic activity, suggesting these environmental factors induced flowering in hickory.
Collapse
Affiliation(s)
- Zhichao Sun
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| | - Zheng Li
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| | - Jianqin Huang
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| | - Bingsong Zheng
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| | - Liangsheng Zhang
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| | - Zhengjia Wang
- School of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Dong Hu Campus, 88 Northern Circle Road, Linan, 311300 China
| |
Collapse
|
14
|
de Jonge WJ, O'Duibhir E, Lijnzaad P, van Leenen D, Groot Koerkamp MJ, Kemmeren P, Holstege FC. Molecular mechanisms that distinguish TFIID housekeeping from regulatable SAGA promoters. EMBO J 2016; 36:274-290. [PMID: 27979920 PMCID: PMC5286361 DOI: 10.15252/embj.201695621] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 10/18/2016] [Accepted: 11/01/2016] [Indexed: 11/28/2022] Open
Abstract
An important distinction is frequently made between constitutively expressed housekeeping genes versus regulated genes. Although generally characterized by different DNA elements, chromatin architecture and cofactors, it is not known to what degree promoter classes strictly follow regulatability rules and which molecular mechanisms dictate such differences. We show that SAGA‐dominated/TATA‐box promoters are more responsive to changes in the amount of activator, even compared to TFIID/TATA‐like promoters that depend on the same activator Hsf1. Regulatability is therefore an inherent property of promoter class. Further analyses show that SAGA/TATA‐box promoters are more dynamic because TATA‐binding protein recruitment through SAGA is susceptible to removal by Mot1. In addition, the nucleosome configuration upon activator depletion shifts on SAGA/TATA‐box promoters and seems less amenable to preinitiation complex formation. The results explain the fundamental difference between housekeeping and regulatable genes, revealing an additional facet of combinatorial control: an activator can elicit a different response dependent on core promoter class.
Collapse
Affiliation(s)
- Wim J de Jonge
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Eoghan O'Duibhir
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Philip Lijnzaad
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Dik van Leenen
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Marian Ja Groot Koerkamp
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Patrick Kemmeren
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Frank Cp Holstege
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands .,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| |
Collapse
|
15
|
Grünberg S, Henikoff S, Hahn S, Zentner GE. Mediator binding to UASs is broadly uncoupled from transcription and cooperative with TFIID recruitment to promoters. EMBO J 2016; 35:2435-2446. [PMID: 27797823 DOI: 10.15252/embj.201695020] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 08/30/2016] [Accepted: 09/20/2016] [Indexed: 11/09/2022] Open
Abstract
Mediator is a conserved, essential transcriptional coactivator complex, but its in vivo functions have remained unclear due to conflicting data regarding its genome-wide binding pattern obtained by genome-wide ChIP Here, we used ChEC-seq, a method orthogonal to ChIP, to generate a high-resolution map of Mediator binding to the yeast genome. We find that Mediator associates with upstream activating sequences (UASs) rather than the core promoter or gene body under all conditions tested. Mediator occupancy is surprisingly correlated with transcription levels at only a small fraction of genes. Using the same approach to map TFIID, we find that TFIID is associated with both TFIID- and SAGA-dependent genes and that TFIID and Mediator occupancy is cooperative. Our results clarify Mediator recruitment and binding to the genome, showing that Mediator binding to UASs is widespread, partially uncoupled from transcription, and mediated in part by TFIID.
Collapse
Affiliation(s)
- Sebastian Grünberg
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Steven Hahn
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | |
Collapse
|
16
|
Zhang P, Du G, Zou H, Xie G, Chen J, Shi Z, Zhou J. Genome-wide mapping of nucleosome positions in Saccharomyces cerevisiae in response to different nitrogen conditions. Sci Rep 2016; 6:33970. [PMID: 27659668 PMCID: PMC5034280 DOI: 10.1038/srep33970] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 09/06/2016] [Indexed: 12/31/2022] Open
Abstract
Well-organized chromatin is involved in a number of various transcriptional regulation and gene expression. We used genome-wide mapping of nucleosomes in response to different nitrogen conditions to determine both nucleosome profiles and gene expression events in Saccharomyces cerevisiae. Nitrogen conditions influence general nucleosome profiles and the expression of nitrogen catabolite repression (NCR) sensitive genes. The nucleosome occupancy of TATA-containing genes was higher compared to TATA-less genes. TATA-less genes in high or low nucleosome occupancy, showed a significant change in gene coding regions when shifting cells from glutamine to proline as the sole nitrogen resource. Furthermore, a correlation between the expression of nucleosome occupancy induced NCR sensitive genes or TATA containing genes in NCR sensitive genes, and nucleosome prediction were found when cells were cultured in proline or shifting from glutamine to proline as the sole nitrogen source compared to glutamine. These results also showed that variation of nucleosome occupancy accompany with chromatin-dependent transcription factor could influence the expression of a series of genes involved in the specific regulation of nitrogen utilization.
Collapse
Affiliation(s)
- Peng Zhang
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Guocheng Du
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Huijun Zou
- Zhejiang Guyuelongshan Shaoxing Wine Company, 13 Yangjiang Road, Shaoxing, Zhejiang, China
| | - Guangfa Xie
- Zhejiang Guyuelongshan Shaoxing Wine Company, 13 Yangjiang Road, Shaoxing, Zhejiang, China
| | - Jian Chen
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Zhongping Shi
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
17
|
Shabbir Hussain M, Gambill L, Smith S, Blenner MA. Engineering Promoter Architecture in Oleaginous Yeast Yarrowia lipolytica. ACS Synth Biol 2016; 5:213-23. [PMID: 26635071 DOI: 10.1021/acssynbio.5b00100] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Eukaryotic promoters have a complex architecture to control both the strength and timing of gene transcription spanning up to thousands of bases from the initiation site. This complexity makes rational fine-tuning of promoters in fungi difficult to predict; however, this very same complexity enables multiple possible strategies for engineering promoter strength. Here, we studied promoter architecture in the oleaginous yeast, Yarrowia lipolytica. While recent studies have focused on upstream activating sequences, we systematically examined various components common in fungal promoters. Here, we examine several promoter components including upstream activating sequences, proximal promoter sequences, core promoters, and the TATA box in autonomously replicating expression plasmids and integrated into the genome. Our findings show that promoter strength can be fine-tuned through the engineering of the TATA box sequence, core promoter, and upstream activating sequences. Additionally, we identified a previously unreported oleic acid responsive transcription enhancement in the XPR2 upstream activating sequences, which illustrates the complexity of fungal promoters. The promoters engineered here provide new genetic tools for metabolic engineering in Y. lipolytica and provide promoter engineering strategies that may be useful in engineering other non-model fungal systems.
Collapse
Affiliation(s)
- Murtaza Shabbir Hussain
- Department of Chemical and Biomolecular Engineering and ‡Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina 29634, United States
| | - Lauren Gambill
- Department of Chemical and Biomolecular Engineering and ‡Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina 29634, United States
| | - Spencer Smith
- Department of Chemical and Biomolecular Engineering and ‡Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina 29634, United States
| | - Mark A. Blenner
- Department of Chemical and Biomolecular Engineering and ‡Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina 29634, United States
| |
Collapse
|
18
|
Impact of cis-acting elements' frequency in transcription activity in dicot and monocot plants. 3 Biotech 2015; 5:1007-1019. [PMID: 28324408 PMCID: PMC4624133 DOI: 10.1007/s13205-015-0305-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 04/25/2015] [Indexed: 10/27/2022] Open
Abstract
The production of new cultivars via recombinant DNA technology is important in applied agriculture. Promoters play fundamental roles in successful transformation and gene expression. Fragments of the upstream regulatory region of the movement protein gene of the Tomato yellow leaf curl virus (TYLCV; two fragments) and Watermelon chlorotic stunt virus (WmCSV, two fragments) and one fragment of the coat protein putative promoter of TYLCV (CPTY-pro) were isolated to assess their abilities to drive expression in monocot and dicot plants. We used bioinformatic analyses to identify tentative motifs in the fragments. The five promoter fragments were isolated, fused with the GUS reporter gene, and transformed into tomato, watermelon, and rice plantlets via Agrobacterium infiltration. GUS expression driven by each putative promoter was analysed using histochemical and fluorometric analyses. In both dicots and the monocots, the highest level of GUS expression was obtained using a truncated regulatory region from TYLCV (MMPTY-pro) followed by a truncated regulatory region from WmCSV (MMPWm-pro). However, the corresponding full-length fragments from TYLCV and WmCSV showed essentially equivalent expression levels in the fluorometric GUS assay compared with the enhanced Cauliflower mosaic virus e35S-pro. In addition, CPTY-pro showed no expression in either the dicots or the monocot. This study demonstrated that MMPTY-pro and MMPWm-pro may be useful as plant promoters.
Collapse
|
19
|
Maximal Expression of the Evolutionarily Conserved Slit2 Gene Promoter Requires Sp1. Cell Mol Neurobiol 2015; 36:955-964. [PMID: 26456684 DOI: 10.1007/s10571-015-0281-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Accepted: 10/01/2015] [Indexed: 10/22/2022]
Abstract
Slit2 is a neural axon guidance and chemorepellent protein that stimulates motility in a variety of cell types. The role of Slit2 in neural development and neoplastic growth and migration has been well established, while the genetic mechanisms underlying regulation of the Slit2 gene have not. We identified the core and proximal promoter of Slit2 by mapping multiple transcriptional start sites, analyzing transcriptional activity, and confirming sequence homology for the Slit2 proximal promoter among a number of species. Deletion series and transient transfection identified the Slit2 proximal promoter as within 399 base pairs upstream of the start of transcription. A crucial region for full expression of the Slit2 proximal promoter lies between 399 base pairs and 296 base pairs upstream of the start of transcription. Computer modeling identified three transcription factor-binding consensus sites within this region, of which only site-directed mutagenesis of one of the two identified Sp1 consensus sites inhibited transcriptional activity of the Slit2 proximal promoter (-399 to +253). Bioinformatics analysis of the Slit2 proximal promoter -399 base pair to -296 base pair region shows high sequence conservation over twenty-two species, and that this region follows an expected pattern of sequence divergence through evolution.
Collapse
|
20
|
The development and characterization of synthetic minimal yeast promoters. Nat Commun 2015; 6:7810. [PMID: 26183606 PMCID: PMC4518256 DOI: 10.1038/ncomms8810] [Citation(s) in RCA: 191] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 06/15/2015] [Indexed: 01/11/2023] Open
Abstract
Synthetic promoters, especially minimally sized, are critical for advancing fungal synthetic biology. Fungal promoters often span hundreds of base pairs, nearly ten times the amount of bacterial counterparts. This size limits large-scale synthetic biology efforts in yeasts. Here we address this shortcoming by establishing a methodical workflow necessary to identify robust minimal core elements that can be linked with minimal upstream activating sequences to develop short, yet strong yeast promoters. Through a series of library-based synthesis, analysis and robustness tests, we create a set of non-homologous, purely synthetic, minimal promoters for yeast. These promoters are comprised of short core elements that are generic and interoperable and 10 bp UAS elements that impart strong, constitutive function. Through this methodology, we are able to generate the shortest fungal promoters to date, which can achieve high levels of both inducible and constitutive expression with up to an 80% reduction in size. Endogenous fungal gene promoters can be hundreds of base pairs long, limiting their use in synthetic biology and biotechnology. Here Redden and Alper screen a library of synthetic promoter elements to generate compact DNA sequences of ∼100 base pairs able to drive high levels of gene expression.
Collapse
|
21
|
Identifying functional transcription factor binding sites in yeast by considering their positional preference in the promoters. PLoS One 2014; 8:e83791. [PMID: 24386279 PMCID: PMC3873331 DOI: 10.1371/journal.pone.0083791] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 11/08/2013] [Indexed: 11/25/2022] Open
Abstract
Transcription factor binding site (TFBS) identification plays an important role in deciphering gene regulatory codes. With comprehensive knowledge of TFBSs, one can understand molecular mechanisms of gene regulation. In the recent decades, various computational approaches have been proposed to predict TFBSs in the genome. The TFBS dataset of a TF generated by each algorithm is a ranked list of predicted TFBSs of that TF, where top ranked TFBSs are statistically significant ones. However, whether these statistically significant TFBSs are functional (i.e. biologically relevant) is still unknown. Here we develop a post-processor, called the functional propensity calculator (FPC), to assign a functional propensity to each TFBS in the existing computationally predicted TFBS datasets. It is known that functional TFBSs reveal strong positional preference towards the transcriptional start site (TSS). This motivates us to take TFBS position relative to the TSS as the key idea in building our FPC. Based on our calculated functional propensities, the TFBSs of a TF in the original TFBS dataset could be reordered, where top ranked TFBSs are now the ones with high functional propensities. To validate the biological significance of our results, we perform three published statistical tests to assess the enrichment of Gene Ontology (GO) terms, the enrichment of physical protein-protein interactions, and the tendency of being co-expressed. The top ranked TFBSs in our reordered TFBS dataset outperform the top ranked TFBSs in the original TFBS dataset, justifying the effectiveness of our post-processor in extracting functional TFBSs from the original TFBS dataset. More importantly, assigning functional propensities to putative TFBSs enables biologists to easily identify which TFBSs in the promoter of interest are likely to be biologically relevant and are good candidates to do further detailed experimental investigation. The FPC is implemented as a web tool at http://santiago.ee.ncku.edu.tw/FPC/.
Collapse
|
22
|
Yang TH, Wu WS. Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S13. [PMID: 24565265 PMCID: PMC4029220 DOI: 10.1186/1752-0509-7-s6-s13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background Chromatin immunoprecipitation (ChIP) experiments are now the most comprehensive experimental approaches for mapping the binding of transcription factors (TFs) to their target genes. However, ChIP data alone is insufficient for identifying functional binding target genes of TFs for two reasons. First, there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments. Second, binding signals in the ChIP data do not necessarily imply functionality. Methods It is known that ChIP-chip data and TF knockout (TFKO) data reveal complementary information on gene regulation. While ChIP-chip data can provide TF-gene binding pairs, TFKO data can provide TF-gene regulation pairs. Therefore, we propose a novel network approach for identifying functional TF-gene binding pairs by integrating the ChIP-chip data with the TFKO data. In our method, a TF-gene binding pair from the ChIP-chip data is regarded to be functional if it also has high confident curated TFKO TF-gene regulatory relation or deduced hypostatic TF-gene regulatory relation. Results and conclusions We first validated our method on a gathered ground truth set. Then we applied our method to the ChIP-chip data to identify functional TF-gene binding pairs. The biological significance of our identified functional TF-gene binding pairs was shown by assessing their functional enrichment, the prevalence of protein-protein interaction, and expression coherence. Our results outperformed the results of three existing methods across all measures. And our identified functional targets of TFs also showed statistical significance over the randomly assigned TF-gene pairs. We also showed that our method is dataset independent and can apply to ChIP-seq data and the E. coli genome. Finally, we provided an example showing the biological applicability of our notion.
Collapse
|
23
|
de Boer CG, van Bakel H, Tsui K, Li J, Morris QD, Nislow C, Greenblatt JF, Hughes TR. A unified model for yeast transcript definition. Genome Res 2013; 24:154-66. [PMID: 24170600 PMCID: PMC3875857 DOI: 10.1101/gr.164327.113] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling transcription of the genome using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-seq data and tested it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of nongenic transcripts, and provides insight into genome evolution.
Collapse
|
24
|
Lusk RW, Eisen MB. Spatial promoter recognition signatures may enhance transcription factor specificity in yeast. PLoS One 2013; 8:e53778. [PMID: 23320104 PMCID: PMC3540036 DOI: 10.1371/journal.pone.0053778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 12/04/2012] [Indexed: 11/26/2022] Open
Abstract
The short length and high degeneracy of sites recognized by DNA-binding transcription factors limit the amount of information they can carry, and individual sites are rarely sufficient to mediate the regulation of specific targets. Computational analysis of microbial genomes has suggested that many factors function optimally when in a particular orientation and position with respect to their target promoters. To investigate this further, we developed and trained spatial models of binding site positioning and applied them to the genome of the yeast Saccharomyces cerevisiae. We found evidence of non-random organization of sites within promoters, differences in binding site density, or both for thirty-eight transcription factors. We show that these signatures allow transcription factors with substantial differences in binding site specificity to share similar promoter specificities. We illustrate how spatial information dictating the positioning and density of binding sites can in principle increase the information available to the organism for differentiating a transcription factor’s true targets, and we indicate how this information could potentially be leveraged for the same purpose in bioinformatic analyses.
Collapse
Affiliation(s)
- Richard W. Lusk
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael B. Eisen
- Department of Molecular & Cell Biology, University of California, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
25
|
Tsai ZTY, Tsai HK, Cheng JH, Lin CH, Tsai YF, Wang D. Evolution of cis-regulatory elements in yeast de novo and duplicated new genes. BMC Genomics 2012; 13:717. [PMID: 23256513 PMCID: PMC3553024 DOI: 10.1186/1471-2164-13-717] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 12/18/2012] [Indexed: 12/22/2022] Open
Abstract
Background New genes that originate from non-coding DNA rather than being duplicated from parent genes are called de novo genes. Their short evolution time and lack of parent genes provide a chance to study the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of these elements in de novo genes is lacking. Here, we conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes. Results In a genome-wide investigation, we found that the number of transcription factor binding sites (TFBSs) in de novo genes of S. cerevisiae increased rapidly and quickly became comparable to the number of TFBSs in established genes. This phenomenon might have resulted from certain characteristics of de novo genes; namely, a relatively frequent gain of TFBSs, an unexpectedly high number of preexisting TFBSs, or lower selection pressure in the promoter regions of the de novo genes. Furthermore, we identified differences in the promoter architecture between de novo genes and duplicated new genes, suggesting that distinct regulatory strategies might be employed by genes of different origin. Finally, our functional analyses of the yeast de novo genes revealed that they might be related to reproduction. Conclusions Our observations showed that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, implying that these two types of genes might have different roles in evolution.
Collapse
|
26
|
Pachkov M, Balwierz PJ, Arnold P, Ozonov E, van Nimwegen E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res 2012. [PMID: 23180783 PMCID: PMC3531101 DOI: 10.1093/nar/gks1145] [Citation(s) in RCA: 109] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Identification of genomic regulatory elements is essential for understanding the dynamics of cellular processes. This task has been substantially facilitated by the availability of genome sequences for many species and high-throughput data of transcripts and transcription factor (TF) binding. However, rigorous computational methods are necessary to derive accurate genome-wide annotations of regulatory sites from such data. SwissRegulon (http://swissregulon.unibas.ch) is a database containing genome-wide annotations of regulatory motifs, promoters and TF binding sites (TFBSs) in promoter regions across model organisms. Its binding site predictions were obtained with rigorous Bayesian probabilistic methods that operate on orthologous regions from related genomes, and use explicit evolutionary models to assess the evidence of purifying selection on each site. New in the current version of SwissRegulon is a curated collection of 190 mammalian regulatory motifs associated with ∼340 TFs, and TFBS annotations across a curated set of ∼35 000 promoters in both human and mouse. Predictions of TFBSs for Saccharomyces cerevisiae have also been significantly extended and now cover 158 of yeast’s ∼180 TFs. All data are accessible through both an easily navigable genome browser with search functions, and as flat files that can be downloaded for further analysis.
Collapse
Affiliation(s)
- Mikhail Pachkov
- Biozentrum, University of Basel, and Swiss Institute of Bioinformatics, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
| | | | | | | | | |
Collapse
|