1
|
Beernink BM, Vogel JP, Lei L. Enhancers in Plant Development, Adaptation and Evolution. PLANT & CELL PHYSIOLOGY 2025; 66:461-476. [PMID: 39412125 PMCID: PMC12085095 DOI: 10.1093/pcp/pcae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/13/2024] [Accepted: 10/09/2024] [Indexed: 05/18/2025]
Abstract
Understanding plant responses to developmental and environmental cues is crucial for studying morphological divergence and local adaptation. Gene expression changes, governed by cis-regulatory modules (CRMs) including enhancers, are a major source of plant phenotypic variation. However, while genome-wide approaches have revealed thousands of putative enhancers in mammals, far fewer have been identified and functionally characterized in plants. This review provides an overview of how enhancers function to control gene regulation, methods to predict DNA sequences that may have enhancer activity, methods utilized to functionally validate enhancers and the current knowledge of enhancers in plants, including how they impact plant development, response to environment and evolutionary adaptation.
Collapse
Affiliation(s)
- Bliss M Beernink
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - John P Vogel
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Li Lei
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| |
Collapse
|
2
|
Jatzlau J, Do SN, Mees RA, Mendez PL, Khan RJ, Maas L, Ruiz L, Martin-Malpartida P, Macias MJ, Knaus P. Rare but specific: 5-bp composite motifs define SMAD binding in BMP signaling. BMC Biol 2025; 23:79. [PMID: 40082964 PMCID: PMC11907993 DOI: 10.1186/s12915-025-02183-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 03/03/2025] [Indexed: 03/16/2025] Open
Abstract
BACKGROUND Receptor-activated SMADs trimerize with SMAD4 to regulate context-dependent target gene expression. However, the presence of a single SMAD1/5/8 binding motif in cis-regulatory elements alone does not trigger transcription in native contexts. We hypothesize that binding to composite motifs in which at least two SMAD binding sites are in close proximity would be enough to induce transcription as this scenario allows the simultaneous interaction of at least two SMAD proteins, thereby increasing specificity and affinity. RESULTS Using more than 65 distinct firefly luciferase constructs, we delineated the minimal requirements for BMP-induced gene activation. We propose a model in which two SMAD-MH1 domains bind a SMAD-composite motif in a back-to-back fashion with a 5-bp distance between the SMAD-motifs on opposing DNA strands. However screening of SMAD1-bound regions across a variety of cell types highlights that these composite motifs are extremely uncommon, explaining below 1% of SMAD1 binding events. CONCLUSIONS Deviations from these minimal requirements prevent transcription and underline the need for co-transcription factors to achieve gene activation.
Collapse
Affiliation(s)
- Jerome Jatzlau
- Institute of Chemistry and Biochemistry, Freie Universitaet Berlin, Berlin, Germany
| | - Sophie-Nhi Do
- Institute of Chemistry and Biochemistry, Freie Universitaet Berlin, Berlin, Germany
| | - Rebeca A Mees
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac, 10, Barcelona, 08028, Spain
| | - Paul-Lennard Mendez
- Institute of Chemistry and Biochemistry, Freie Universitaet Berlin, Berlin, Germany
| | - Rameez Jabeer Khan
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac, 10, Barcelona, 08028, Spain
| | - Lukas Maas
- Institute of Chemistry and Biochemistry, Freie Universitaet Berlin, Berlin, Germany
| | - Lidia Ruiz
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac, 10, Barcelona, 08028, Spain
| | - Pau Martin-Malpartida
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac, 10, Barcelona, 08028, Spain
| | - Maria J Macias
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac, 10, Barcelona, 08028, Spain.
- ICREA, Passeig Lluís Companys 23, Barcelona, 08010, Spain.
| | - Petra Knaus
- Institute of Chemistry and Biochemistry, Freie Universitaet Berlin, Berlin, Germany.
| |
Collapse
|
3
|
Batool F, Shireen H, Malik MF, Abrar M, Abbasi AA. The combinatorial binding syntax of transcription factors in forebrain-specific enhancers. Biol Open 2025; 14:BIO061751. [PMID: 39976127 PMCID: PMC11876843 DOI: 10.1242/bio.061751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 01/24/2025] [Indexed: 02/21/2025] Open
Abstract
Tissue-specific gene regulation in mammals involves the coordinated binding of multiple transcription factors (TFs). Using the forebrain as a model, we investigated the syntax of TF occupancy to determine tissue-specific enhancer regions. We analyzed forebrain-exclusive enhancers from the VISTA Enhancer Browser and a curated set of 23 TFs relevant to forebrain development and disease. Our findings revealed multiple distinct patterns of combinatorial TF binding, with the HES5-FOXP2-GATA3 triad being the most frequent in forebrain-specific enhancers. This syntactic structure was detected in 2614 enhancers from a genome-wide catalog of 25,000 predicted human forebrain enhancers. Notably, this catalog represents a computationally predicted dataset, distinct from the in vivo validated set of enhancers obtained from the VISTA Enhancer Browser. The shortlisted 2614 enhancers were further analyzed using genome-wide epigenetic data and evaluated for evolutionary conservation and disease relevance. Our findings highlight the value of these 2614 enhancers in forebrain-specific gene regulation and provide a framework for discovering tissue-specific enhancers, enhancing the understanding of enhancer function.
Collapse
Affiliation(s)
- Fatima Batool
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Huma Shireen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Muhammad Faizan Malik
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Muhammad Abrar
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| |
Collapse
|
4
|
Pampari A, Shcherbina A, Kvon EZ, Kosicki M, Nair S, Kundu S, Kathiria AS, Risca VI, Kuningas K, Alasoo K, Greenleaf WJ, Pennacchio LA, Kundaje A. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.25.630221. [PMID: 39829783 PMCID: PMC11741299 DOI: 10.1101/2024.12.25.630221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Despite extensive mapping of cis-regulatory elements (cREs) across cellular contexts with chromatin accessibility assays, the sequence syntax and genetic variants that regulate transcription factor (TF) binding and chromatin accessibility at context-specific cREs remain elusive. We introduce ChromBPNet, a deep learning DNA sequence model of base-resolution accessibility profiles that detects, learns and deconvolves assay-specific enzyme biases from regulatory sequence determinants of accessibility, enabling robust discovery of compact TF motif lexicons, cooperative motif syntax and precision footprints across assays and sequencing depths. Extensive benchmarks show that ChromBPNet, despite its lightweight design, is competitive with much larger contemporary models at predicting variant effects on chromatin accessibility, pioneer TF binding and reporter activity across assays, cell contexts and ancestry, while providing interpretation of disrupted regulatory syntax. ChromBPNet also helps prioritize and interpret regulatory variants that influence complex traits and rare diseases, thereby providing a powerful lens to decode regulatory DNA and genetic variation.
Collapse
Affiliation(s)
- Anusri Pampari
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Anna Shcherbina
- Department of Biomedical Data Sciences, Stanford University, Stanford CA, 94305
| | - Evgeny Z. Kvon
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Michael Kosicki
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | | | | | | | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - William James Greenleaf
- Department of Genetics, Stanford University, Stanford CA, 94305
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA
| | - Len A. Pennacchio
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford CA, 94305
- Department of Genetics, Stanford University, Stanford CA, 94305
| |
Collapse
|
5
|
Baniulyte G, McCann AA, Woodstock DL, Sammons MA. Crosstalk between paralogs and isoforms influences p63-dependent regulatory element activity. Nucleic Acids Res 2024; 52:13812-13831. [PMID: 39565223 DOI: 10.1093/nar/gkae1143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 10/04/2024] [Accepted: 11/01/2024] [Indexed: 11/21/2024] Open
Abstract
The p53 family of transcription factors (p53, p63 and p73) regulate diverse organismal processes including tumor suppression, maintenance of genome integrity and the development of skin and limbs. Crosstalk between transcription factors with highly similar DNA binding profiles, like those in the p53 family, can dramatically alter gene regulation. While p53 is primarily associated with transcriptional activation, p63 mediates both activation and repression. The specific mechanisms controlling p63-dependent gene regulatory activity are not well understood. Here, we use massively parallel reporter assays (MPRA) to investigate how local DNA sequence context influences p63-dependent transcriptional activity. Most regulatory elements with a p63 response element motif (p63RE) activate transcription, although binding of the p63 paralog, p53, drives a substantial proportion of that activity. p63RE sequence content and co-enrichment with other known activating and repressing transcription factors, including lineage-specific factors, correlates with differential p63RE-mediated activities. p63 isoforms dramatically alter transcriptional behavior, primarily shifting inactive regulatory elements towards high p63-dependent activity. Our analysis provides novel insight into how local sequence and cellular context influences p63-dependent behaviors and highlights the key, yet still understudied, role of transcription factor paralogs and isoforms in controlling gene regulatory element activity.
Collapse
Affiliation(s)
- Gabriele Baniulyte
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Abby A McCann
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Dana L Woodstock
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Morgan A Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| |
Collapse
|
6
|
Maritato R, Medugno A, D'Andretta E, De Riso G, Lupo M, Botta S, Marrocco E, Renda M, Sofia M, Mussolino C, Bacci ML, Surace EM. A DNA base-specific sequence interposed between CRX and NRL contributes to RHODOPSIN expression. Sci Rep 2024; 14:26313. [PMID: 39487168 PMCID: PMC11530525 DOI: 10.1038/s41598-024-76664-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 10/15/2024] [Indexed: 11/04/2024] Open
Abstract
Gene expression emerges from DNA sequences through the interaction of transcription factors (TFs) with DNA cis-regulatory sequences. In eukaryotes, TFs bind to transcription factor binding sites (TFBSs) with differential affinities, enabling cell-specific gene expression. In this view, DNA enables TF binding along a continuum ranging from low to high affinity depending on its sequence composition; however, it is not known whether evolution has entailed a further level of entanglement between DNA-protein interaction. Here we found that the composition and length (22 bp) of the DNA sequence interposed between the CRX and NRL retinal TFs in the proximal promoter of RHODOPSIN (RHO) largely controls the expression levels of RHO. Mutagenesis of CRX-NRL DNA linking sequences (here termed "DNA-linker") results in uncorrelated gene expression variation. In contrast, mutual exchange of naturally occurring divergent human and mouse Rho cis-regulatory elements conferred similar yet species-specific Rho expression levels. Two orthogonal DNA-binding proteins targeted to the DNA-linker either activate or repress the expression of Rho depending on the DNA-linker orientation relative to the CRX and NRL binding sites. These results argue that, in this instance, DNA itself contributes to CRX and NRL activities through a code based on specific base sequences of a defined length, ultimately determining optimal RHO expression levels.
Collapse
Affiliation(s)
- Rosa Maritato
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Alessia Medugno
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Emanuela D'Andretta
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Giulia De Riso
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
- AOU Federico II, Naples, Italy
| | - Mariangela Lupo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Salvatore Botta
- Department of Translational Medical Science, University of Campania Luigi Vanvitelli, Naples, Italy
| | - Elena Marrocco
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Mario Renda
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Martina Sofia
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | | | - Maria Laura Bacci
- Department of Veterinary Medical Sciences, University of Bologna, Bologna, Italy
| | - Enrico Maria Surace
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy.
| |
Collapse
|
7
|
Kliesmete Z, Orchard P, Lee VYK, Geuder J, Krauß SM, Ohnuki M, Jocher J, Vieth B, Enard W, Hellmann I. Evidence for compensatory evolution within pleiotropic regulatory elements. Genome Res 2024; 34:1528-1539. [PMID: 39255977 PMCID: PMC11534155 DOI: 10.1101/gr.279001.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024]
Abstract
Pleiotropy, measured as expression breadth across tissues, is one of the best predictors for protein sequence and expression conservation. In this study, we investigated its effect on the evolution of cis-regulatory elements (CREs). To this end, we carefully reanalyzed the Epigenomics Roadmap data for nine fetal tissues, assigning a measure of pleiotropic degree to nearly half a million CREs. To assess the functional conservation of CREs, we generated ATAC-seq and RNA-seq data from humans and macaques. We found that more pleiotropic CREs exhibit greater conservation in accessibility, and the mRNA expression levels of the associated genes are more conserved. This trend of higher conservation for higher degrees of pleiotropy persists when analyzing the transcription factor binding repertoire. In contrast, simple DNA sequence conservation of orthologous sites between species tends to be even lower for pleiotropic CREs than for species-specific CREs. Combining various lines of evidence, we propose that the lack of sequence conservation in functionally conserved pleiotropic CREs is owing to within-element compensatory evolution. In summary, our findings suggest that pleiotropy is also a good predictor for the functional conservation of CREs, even though this is not reflected in the sequence conservation of pleiotropic CREs.
Collapse
Affiliation(s)
- Zane Kliesmete
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Peter Orchard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, USA
| | - Victor Yan Kin Lee
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, 1350 Copenhagen, Denmark
| | - Johanna Geuder
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Simon M Krauß
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Department of Hematology, Cell Therapy, Hemostaseology and Infectious Diseases, University Leipzig Medical Center, 04103 Leipzig, Germany
| | - Mari Ohnuki
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
- Faculty of Medicine, Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto 606-8501, Japan
| | - Jessica Jocher
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians Universität München, 82152 Munich, Germany;
| |
Collapse
|
8
|
Jores T, Tonnies J, Mueth NA, Romanowski A, Fields S, Cuperus JT, Queitsch C. Plant enhancers exhibit both cooperative and additive interactions among their functional elements. THE PLANT CELL 2024; 36:2570-2586. [PMID: 38513612 PMCID: PMC11218779 DOI: 10.1093/plcell/koae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/16/2024] [Accepted: 03/04/2024] [Indexed: 03/23/2024]
Abstract
Enhancers are cis-regulatory elements that shape gene expression in response to numerous developmental and environmental cues. In animals, several models have been proposed to explain how enhancers integrate the activity of multiple transcription factors. However, it remains largely unclear how plant enhancers integrate transcription factor activity. Here, we use Plant STARR-seq to characterize 3 light-responsive plant enhancers-AB80, Cab-1, and rbcS-E9-derived from genes associated with photosynthesis. Saturation mutagenesis revealed mutations, many of which clustered in short regions, that strongly reduced enhancer activity in the light, in the dark, or in both conditions. When tested in the light, these mutation-sensitive regions did not function on their own; rather, cooperative interactions with other such regions were required for full activity. Epistatic interactions occurred between mutations in adjacent mutation-sensitive regions, and the spacing and order of mutation-sensitive regions in synthetic enhancers affected enhancer activity. In contrast, when tested in the dark, mutation-sensitive regions acted independently and additively in conferring enhancer activity. Taken together, this work demonstrates that plant enhancers show evidence for both cooperative and additive interactions among their functional elements. This knowledge can be harnessed to design strong, condition-specific synthetic enhancers.
Collapse
Affiliation(s)
- Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Institute of Synthetic Biology, Heinrich Heine University Düsseldorf, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Science (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf 40225, Germany
| | - Jackson Tonnies
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Graduate Program in Biology, University of Washington, Seattle, WA 98195, USA
| | - Nicholas A Mueth
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Andrés Romanowski
- Molecular Biology Group, Plant Sciences, Wageningen University & Research, 6708 PB Wageningen, the Netherlands
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
9
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
10
|
Vaknin I, Willinger O, Mandl J, Heuberger H, Ben-Ami D, Zeng Y, Goldberg S, Orenstein Y, Amit R. A universal system for boosting gene expression in eukaryotic cell-lines. Nat Commun 2024; 15:2394. [PMID: 38493141 PMCID: PMC10944472 DOI: 10.1038/s41467-024-46573-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/04/2024] [Indexed: 03/18/2024] Open
Abstract
We demonstrate a transcriptional regulatory design algorithm that can boost expression in yeast and mammalian cell lines. The system consists of a simplified transcriptional architecture composed of a minimal core promoter and a synthetic upstream regulatory region (sURS) composed of up to three motifs selected from a list of 41 motifs conserved in the eukaryotic lineage. The sURS system was first characterized using an oligo-library containing 189,990 variants. We validate the resultant expression model using a set of 43 unseen sURS designs. The validation sURS experiments indicate that a generic set of grammar rules for boosting and attenuation may exist in yeast cells. Finally, we demonstrate that this generic set of grammar rules functions similarly in mammalian CHO-K1 and HeLa cells. Consequently, our work provides a design algorithm for boosting the expression of promoters used for expressing industrially relevant proteins in yeast and mammalian cell lines.
Collapse
Affiliation(s)
- Inbal Vaknin
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Or Willinger
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Jonathan Mandl
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Hadar Heuberger
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Dan Ben-Ami
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yi Zeng
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Sarah Goldberg
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Yaron Orenstein
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel.
- The Russell Berrie Nanotechnology Institute, Technion, Haifa, Israel.
| |
Collapse
|
11
|
Mahendrawada L, Warfield L, Donczew R, Hahn S. Surprising connections between DNA binding and function for the near-complete set of yeast transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550593. [PMID: 37546716 PMCID: PMC10402042 DOI: 10.1101/2023.07.25.550593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
DNA sequence-specific transcription factors (TFs) modulate transcription and chromatin architecture, acting from regulatory sites in enhancers and promoters of eukaryotic genes. How TFs locate their DNA targets and how multiple TFs cooperate to regulate individual genes is still unclear. Most yeast TFs are thought to regulate transcription via binding to upstream activating sequences, situated within a few hundred base pairs upstream of the regulated gene. While this model has been validated for individual TFs and specific genes, it has not been tested in a systematic way with the large set of yeast TFs. Here, we have integrated information on the binding and expression targets for the near-complete set of yeast TFs. While we found many instances of functional TF binding sites in upstream regulatory regions, we found many more instances that do not fit this model. In many cases, rapid TF depletion affects gene expression where there is no detectable binding of that TF to the upstream region of the affected gene. In addition, for most TFs, only a small fraction of bound TFs regulates the nearby gene, showing that TF binding does not automatically correspond to regulation of the linked gene. Finally, we found that only a small percentage of TFs are exclusively strong activators or repressors with most TFs having dual function. Overall, our comprehensive mapping of TF binding and regulatory targets have both confirmed known TF relationships and revealed surprising properties of TF function.
Collapse
|
12
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
13
|
Georgakopoulos-Soares I, Deng C, Agarwal V, Chan CSY, Zhao J, Inoue F, Ahituv N. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat Commun 2023; 14:2333. [PMID: 37087538 PMCID: PMC10122648 DOI: 10.1038/s41467-023-37960-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 04/06/2023] [Indexed: 04/24/2023] Open
Abstract
The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| | - Chengyu Deng
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
14
|
Martinez-Corral R, Park M, Biette KM, Friedrich D, Scholes C, Khalil AS, Gunawardena J, DePace AH. Transcriptional kinetic synergy: A complex landscape revealed by integrating modeling and synthetic biology. Cell Syst 2023; 14:324-339.e7. [PMID: 37080164 PMCID: PMC10472254 DOI: 10.1016/j.cels.2023.02.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 08/22/2022] [Accepted: 02/10/2023] [Indexed: 04/22/2023]
Abstract
Transcription factors (TFs) control gene expression, often acting synergistically. Classical thermodynamic models offer a biophysical explanation for synergy based on binding cooperativity and regulated recruitment of RNA polymerase. Because transcription requires polymerase to transition through multiple states, recent work suggests that "kinetic synergy" can arise through TFs acting on distinct steps of the transcription cycle. These types of synergy are not mutually exclusive and are difficult to disentangle conceptually and experimentally. Here, we model and build a synthetic circuit in which TFs bind to a single shared site on DNA, such that TFs cannot synergize by simultaneous binding. We model mRNA production as a function of both TF binding and regulation of the transcription cycle, revealing a complex landscape dependent on TF concentration, DNA binding affinity, and regulatory activity. We use synthetic TFs to confirm that the transcription cycle must be integrated with recruitment for a quantitative understanding of gene regulation.
Collapse
Affiliation(s)
| | - Minhee Park
- Biological Design Center, Boston University, Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Kelly M Biette
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Dhana Friedrich
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Clarissa Scholes
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Ahmad S Khalil
- Biological Design Center, Boston University, Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Jeremy Gunawardena
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
15
|
Reiter F, de Almeida BP, Stark A. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res 2023; 33:346-358. [PMID: 36941077 PMCID: PMC10078294 DOI: 10.1101/gr.277246.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/14/2023] [Indexed: 03/23/2023]
Abstract
The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important TF motifs by all possible 65,536 eight-nucleotide-long sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease.
Collapse
Affiliation(s)
- Franziska Reiter
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria;
- Medical University of Vienna, Vienna BioCenter, 1030 Vienna, Austria
| |
Collapse
|
16
|
Zhao Y. TFSyntax: a database of transcription factors binding syntax in mammalian genomes. Nucleic Acids Res 2022; 51:D306-D314. [PMID: 36200824 PMCID: PMC9825613 DOI: 10.1093/nar/gkac849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/10/2022] [Accepted: 09/21/2022] [Indexed: 01/29/2023] Open
Abstract
In mammals, transcriptional factors (TFs) drive gene expression by binding to regulatory elements in a cooperative manner. Deciphering the rules of such cooperation is crucial to obtain a full understanding of cellular homeostasis and development. Although this is a long-standing topic, there is no comprehensive database for biologists to access the syntax of TF binding sites. Here we present TFSyntax (https://tfsyntax.zhaopage.com), a database focusing on the arrangement of TF binding sites. TFSyntax maps the binding motif of 1299 human TFs and 890 mouse TFs across 382 cells and tissues, representing the most comprehensive TF binding map to date. In addition to location, TFSyntax defines motif positional preference, density and colocalization within accessible elements. Powered by a series of functional modules based on web interface, users can freely search, browse, analyze, and download data of interest. With comprehensive characterization of TF binding syntax across distinct tissues and cell types, TFSyntax represents a valuable resource and platform for studying the mechanism of transcriptional regulation and exploring how regulatory DNA variants cause disease.
Collapse
Affiliation(s)
- Yongbing Zhao
- To whom correspondence should be addressed. Tel: +1 301 480 5852;
| |
Collapse
|
17
|
Sipani R, Joshi R. Hox genes collaborate with helix-loop-helix factor Grainyhead to promote neuroblast apoptosis along the anterior-posterior axis of the Drosophila larval central nervous system. Genetics 2022; 222:6632667. [DOI: 10.1093/genetics/iyac101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 06/21/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Hox genes code for a family of a homeodomain (HD) containing transcription factors that use TALE-HD containing factors Pbx/Exd and Meis/Hth to specify the development of the anterior-posterior (AP) axis of an organism. However, the absence of TALE-HD containing factors from specific tissues emphasizes the need to identify and validate new Hox cofactors. In Drosophila central nervous system (CNS), Hox execute segment-specific apoptosis of neural stem cells (neuroblasts-NBs) and neurons. In abdominal segments of larval CNS, Hox gene Abdominal-A (AbdA) mediates NB apoptosis with the help of Exd and bHLH factor Grainyhead (Grh) using a 717 bp apoptotic enhancer. In this study, we show that this enhancer is critical for abdominal NB apoptosis and relies on two separable set of DNA binding motifs responsible for its initiation and maintenance. Our results also show that AbdA and Grh interact through their highly conserved DNA binding domains, and the DNA binding specificity of AbdA-HD is important for it to interact with Grh and essential for it to execute NB apoptosis in CNS. We also establish that Grh is required for Hox-dependent NB apoptosis in Labial and Sex Combs Reduced (Scr) expressing regions of the CNS, and it can physically interact with all the Hox proteins in vitro. Our biochemical and functional data collectively support the idea that Grh can function as a Hox cofactor and help them carry out their in vivo roles during development.
Collapse
Affiliation(s)
- Rashmi Sipani
- Laboratory of Drosophila Neural Development, Centre for DNA Fingerprinting and Diagnostics (CDFD) , Inner Ring Road, Uppal, Hyderabad-500039. India
- Graduate Studies, Manipal Academy of Higher Education , Manipal 576104, India
| | - Rohit Joshi
- Laboratory of Drosophila Neural Development, Centre for DNA Fingerprinting and Diagnostics (CDFD) , Inner Ring Road, Uppal, Hyderabad-500039. India
| |
Collapse
|
18
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 124] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
19
|
Schmitz RJ, Grotewold E, Stam M. Cis-regulatory sequences in plants: Their importance, discovery, and future challenges. THE PLANT CELL 2022; 34:718-741. [PMID: 34918159 PMCID: PMC8824567 DOI: 10.1093/plcell/koab281] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/20/2021] [Indexed: 05/19/2023]
Abstract
The identification and characterization of cis-regulatory DNA sequences and how they function to coordinate responses to developmental and environmental cues is of paramount importance to plant biology. Key to these regulatory processes are cis-regulatory modules (CRMs), which include enhancers and silencers. Despite the extraordinary advances in high-quality sequence assemblies and genome annotations, the identification and understanding of CRMs, and how they regulate gene expression, lag significantly behind. This is especially true for their distinguishing characteristics and activity states. Here, we review the current knowledge on CRMs and breakthrough technologies enabling identification, characterization, and validation of CRMs; we compare the genomic distributions of CRMs with respect to their target genes between different plant species, and discuss the role of transposable elements harboring CRMs in the evolution of gene expression. This is an exciting time to study cis-regulomes in plants; however, significant existing challenges need to be overcome to fully understand and appreciate the role of CRMs in plant biology and in crop improvement.
Collapse
Affiliation(s)
- Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| | - Erich Grotewold
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | | |
Collapse
|
20
|
Ray-Jones H, Spivakov M. Transcriptional enhancers and their communication with gene promoters. Cell Mol Life Sci 2021; 78:6453-6485. [PMID: 34414474 PMCID: PMC8558291 DOI: 10.1007/s00018-021-03903-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 07/08/2021] [Accepted: 07/19/2021] [Indexed: 12/13/2022]
Abstract
Transcriptional enhancers play a key role in the initiation and maintenance of gene expression programmes, particularly in metazoa. How these elements control their target genes in the right place and time is one of the most pertinent questions in functional genomics, with wide implications for most areas of biology. Here, we synthesise classic and recent evidence on the regulatory logic of enhancers, including the principles of enhancer organisation, factors that facilitate and delimit enhancer-promoter communication, and the joint effects of multiple enhancers. We show how modern approaches building on classic insights have begun to unravel the complexity of enhancer-promoter relationships, paving the way towards a quantitative understanding of gene control.
Collapse
Affiliation(s)
- Helen Ray-Jones
- MRC London Institute of Medical Sciences, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK
| | - Mikhail Spivakov
- MRC London Institute of Medical Sciences, London, W12 0NN, UK.
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK.
| |
Collapse
|
21
|
Gillard GB, Grønvold L, Røsæg LL, Holen MM, Monsen Ø, Koop BF, Rondeau EB, Gundappa MK, Mendoza J, Macqueen DJ, Rohlfs RV, Sandve SR, Hvidsten TR. Comparative regulomics supports pervasive selection on gene dosage following whole genome duplication. Genome Biol 2021; 22:103. [PMID: 33849620 PMCID: PMC8042706 DOI: 10.1186/s13059-021-02323-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 03/23/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Whole genome duplication (WGD) events have played a major role in eukaryotic genome evolution, but the consequence of these extreme events in adaptive genome evolution is still not well understood. To address this knowledge gap, we used a comparative phylogenetic model and transcriptomic data from seven species to infer selection on gene expression in duplicated genes (ohnologs) following the salmonid WGD 80-100 million years ago. RESULTS We find rare cases of tissue-specific expression evolution but pervasive expression evolution affecting many tissues, reflecting strong selection on maintenance of genome stability following genome doubling. Ohnolog expression levels have evolved mostly asymmetrically, by diverting one ohnolog copy down a path towards lower expression and possible pseudogenization. Loss of expression in one ohnolog is significantly associated with transposable element insertions in promoters and likely driven by selection on gene dosage including selection on stoichiometric balance. We also find symmetric expression shifts, and these are associated with genes under strong evolutionary constraints such as ribosome subunit genes. This possibly reflects selection operating to achieve a gene dose reduction while avoiding accumulation of "toxic mutations". Mechanistically, ohnolog regulatory divergence is dictated by the number of bound transcription factors in promoters, with transposable elements being one likely source of novel binding sites driving tissue-specific gains in expression. CONCLUSIONS Our results imply pervasive adaptive expression evolution following WGD to overcome the immediate challenges posed by genome doubling and to exploit the long-term genetic opportunities for novel phenotype evolution.
Collapse
Affiliation(s)
- Gareth B. Gillard
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Lars Grønvold
- Center for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Line L. Røsæg
- Center for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Matilde Mengkrog Holen
- Center for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Øystein Monsen
- Center for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Ben F. Koop
- Department of Biology, University of Victoria, Victoria, Canada
| | - Eric B. Rondeau
- Department of Biology, University of Victoria, Victoria, Canada
| | - Manu Kumar Gundappa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - John Mendoza
- Department of Computer Science, San Francisco State University, San Francisco, USA
| | - Daniel J. Macqueen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University, San Francisco, USA
| | - Simen R. Sandve
- Center for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torgeir R. Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
22
|
Transcriptional Silencers: Driving Gene Expression with the Brakes On. Trends Genet 2021; 37:514-527. [PMID: 33712326 DOI: 10.1016/j.tig.2021.02.002] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/01/2021] [Accepted: 02/02/2021] [Indexed: 12/15/2022]
Abstract
Silencers are regulatory DNA elements that reduce transcription from their target promoters; they are the repressive counterparts of enhancers. Although discovered decades ago, and despite evidence of their importance in development and disease, silencers have been much less studied than enhancers. Recently, however, a series of papers have reported systematic studies of silencers in various model systems. Silencers are often bifunctional regulatory elements that can also act as enhancers, depending on cellular context, and are enriched for expression quantitative trait loci (eQTLs) and disease-associated variants. There is not yet evidence of a 'silencer chromatin signature', in the distribution of histone modifications or associated proteins, that is common to all silencers; instead, silencers may fall into various subclasses, acting by distinct (and possibly overlapping) mechanisms.
Collapse
|
23
|
Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021; 56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]
Abstract
Each language has standard books describing that language's grammatical rules. Biologists have searched for similar, albeit more complex, principles relating enhancer sequence to gene expression. Here, we review the literature on enhancer grammar. We introduce dependency grammar, a model where enhancers encode information based on dependencies between enhancer features shaped by mechanistic, evolutionary, and biological constraints. Classifying enhancers based on the types of dependencies may identify unifying principles relating enhancer sequence to gene expression. Such rules would allow us to read the instructions for development within genomes and pinpoint causal enhancer variants underlying disease and evolutionary changes.
Collapse
Affiliation(s)
- Granton A Jindal
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
24
|
Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021; 53:354-366. [PMID: 33603233 PMCID: PMC8812996 DOI: 10.1038/s41588-021-00782-6] [Citation(s) in RCA: 321] [Impact Index Per Article: 80.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 01/07/2021] [Indexed: 01/30/2023]
Abstract
The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.
Collapse
Affiliation(s)
- Žiga Avsec
- Department of Informatics, Technical University of Munich, Garching, Germany,Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Munich, Germany,Currently at DeepMind, London, UK
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Amr Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Khyati Dalal
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA
| | - Robin Fropf
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Charles McAnany
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA,Department of Genetics, Stanford University, Stanford, CA, USA,correspondence: ,
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA,correspondence: ,
| |
Collapse
|
25
|
Koo PK, Ploenzke M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. NAT MACH INTELL 2021; 3:258-266. [PMID: 34322657 DOI: 10.1038/s42256-020-00291-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis on synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation to first layer filters consistently leads to interpretable and robust representations of motifs compared to other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate these results generalise to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs, i.e. setting exponential activations in the first layer, can significantly improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.
Collapse
Affiliation(s)
- Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Matt Ploenzke
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| |
Collapse
|
26
|
Harmston N. Regulation in common: Sponge to zebrafish. Science 2020; 370:657-658. [PMID: 33154124 DOI: 10.1126/science.abe9317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Nathan Harmston
- Science Division, Yale-NUS College, 16 College Avenue West #01-220, 138527, Singapore. .,Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, 169857, Singapore
| |
Collapse
|
27
|
Chen L, Capra JA. Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput Biol 2020; 16:e1008334. [PMID: 33137083 PMCID: PMC7660921 DOI: 10.1371/journal.pcbi.1008334] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 11/12/2020] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.
Collapse
Affiliation(s)
- Ling Chen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
| | - John A. Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
- Vanderbilt Genetics Institute and Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
28
|
Zeitlinger J. Seven myths of how transcription factors read the cis-regulatory code. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:22-31. [PMID: 33134611 PMCID: PMC7592701 DOI: 10.1016/j.coisb.2020.08.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genomics data are now being generated at large quantities, of exquisite high resolution and from single cells. They offer a unique opportunity to develop powerful machine learning algorithms, including neural networks, to uncover the rules of the cis-regulatory code. However, current modeling assumptions are often not based on state-of-the-art knowledge of the cis-regulatory code from transcription, developmental genetics, imaging and structural studies. Here I aim to fill this gap by giving a brief historical overview of the field, describing common misconceptions and providing knowledge that might help to guide computational approaches. I will describe the principles and mechanisms involved in the combinatorial requirement of transcription factor binding motifs for enhancer activity, including the role of chromatin accessibility, repressors and low-affinity motifs in the cis-regulatory code. Deciphering the cis-regulatory code would unlock an enormous amount of regulatory information in the genome and would allow us to locate cis-regulatory genetic variants involved in development and disease.
Collapse
Affiliation(s)
- Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA
- The University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
29
|
Liu J, Shively CA, Mitra RD. Quantitative analysis of transcription factor binding and expression using calling cards reporter arrays. Nucleic Acids Res 2020; 48:e50. [PMID: 32133534 PMCID: PMC7229839 DOI: 10.1093/nar/gkaa141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/31/2020] [Accepted: 02/25/2020] [Indexed: 12/13/2022] Open
Abstract
We report a tool, Calling Cards Reporter Arrays (CCRA), that measures transcription factor (TF) binding and the consequences on gene expression for hundreds of synthetic promoters in yeast. Using Cbf1p and MAX, we demonstrate that the CCRA method is able to detect small changes in binding free energy with a sensitivity comparable to in vitro methods, enabling the measurement of energy landscapes in vivo. We then demonstrate the quantitative analysis of cooperative interactions by measuring Cbf1p binding at synthetic promoters with multiple sites. We find that the cooperativity between Cbf1p dimers varies sinusoidally with a period of 10.65 bp and energetic cost of 1.37 KBT for sites that are positioned ‘out of phase’. Finally, we characterize the binding and expression of a group of TFs, Tye7p, Gcr1p and Gcr2p, that act together as a ‘TF collective’, an important but poorly characterized model of TF cooperativity. We demonstrate that Tye7p often binds promoters without its recognition site because it is recruited by other collective members, whereas these other members require their recognition sites, suggesting a hierarchy where these factors recruit Tye7p but not vice versa. Our experiments establish CCRA as a useful tool for quantitative investigations into TF binding and function.
Collapse
Affiliation(s)
- Jiayue Liu
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Christian A Shively
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Robi D Mitra
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,McDonnell Genome Institute, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| |
Collapse
|
30
|
Amano T. Gene regulatory landscape of the sonic hedgehog locus in embryonic development. Dev Growth Differ 2020; 62:334-342. [PMID: 32343848 DOI: 10.1111/dgd.12668] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Revised: 03/31/2020] [Accepted: 04/15/2020] [Indexed: 12/22/2022]
Abstract
The organs of vertebrate species display a wide variety of morphology. A remaining challenge in evolutionary developmental biology is to elucidate how vertebrate lineages acquire distinct morphological features. Developmental programs are driven by spatiotemporal regulation of gene expression controlled by hundreds of thousands of cis-regulatory elements. Changes in the regulatory elements caused by the introduction of genetic variants can confer regulatory innovation that may underlie morphological novelties. Recent advances in sequencing technology have revealed a number of potential regulatory variants that can alter gene expression patterns. However, a limited number of studies demonstrate causal dependence between genetic and morphological changes. Regulation of Shh expression is a good model to understand how multiple regulatory elements organize tissue-specific gene expression patterns. This model also provides insights into how evolution of molecular traits, such as gene regulatory networks, lead to phenotypic novelty.
Collapse
Affiliation(s)
- Takanori Amano
- Next Generation Human Disease Model Team, RIKEN BioResource Research Center, Tsukuba, Japan
| |
Collapse
|
31
|
Catizone AN, Uzunbas GK, Celadova P, Kuang S, Bose D, Sammons MA. Locally acting transcription factors regulate p53-dependent cis-regulatory element activity. Nucleic Acids Res 2020; 48:4195-4213. [PMID: 32133495 PMCID: PMC7192610 DOI: 10.1093/nar/gkaa147] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 01/27/2020] [Accepted: 02/26/2020] [Indexed: 01/03/2023] Open
Abstract
The master tumor suppressor p53 controls transcription of a wide-ranging gene network involved in apoptosis, cell cycle arrest, DNA damage repair, and senescence. Recent studies revealed pervasive binding of p53 to cis-regulatory elements (CREs), which are non-coding segments of DNA that spatially and temporally control transcription through the combinatorial binding of local transcription factors. Although the role of p53 as a strong trans-activator of gene expression is well known, the co-regulatory factors and local sequences acting at p53-bound CREs are comparatively understudied. We designed and executed a massively parallel reporter assay (MPRA) to investigate the effect of transcription factor binding motifs and local sequence context on p53-bound CRE activity. Our data indicate that p53-bound CREs are both positively and negatively affected by alterations in local sequence context and changes to co-regulatory TF motifs. Our data suggest p53 has the flexibility to cooperate with a variety of transcription factors in order to regulate CRE activity. By utilizing different sets of co-factors across CREs, we hypothesize that global p53 activity is guarded against loss of any one regulatory partner, allowing for dynamic and redundant control of p53-mediated transcription.
Collapse
Affiliation(s)
- Allison N Catizone
- Department of Biological Sciences and the RNA Institute, University at Albany, State University of New York, Albany, NY, USA
| | - Gizem Karsli Uzunbas
- Department of Biological Sciences and the RNA Institute, University at Albany, State University of New York, Albany, NY, USA
| | - Petra Celadova
- Sheffield Institute For Nucleic Acids (SInFoNiA) and Department of Molecular Biology and Biotechnology, The University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK
| | - Sylvia Kuang
- Department of Biological Sciences and the RNA Institute, University at Albany, State University of New York, Albany, NY, USA
| | - Daniel Bose
- Sheffield Institute For Nucleic Acids (SInFoNiA) and Department of Molecular Biology and Biotechnology, The University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK
| | - Morgan A Sammons
- Department of Biological Sciences and the RNA Institute, University at Albany, State University of New York, Albany, NY, USA
| |
Collapse
|
32
|
King DM, Hong CKY, Shepherdson JL, Granas DM, Maricque BB, Cohen BA. Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells. eLife 2020; 9:41279. [PMID: 32043966 PMCID: PMC7077988 DOI: 10.7554/elife.41279] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/07/2020] [Indexed: 01/08/2023] Open
Abstract
In embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic cis-regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar. The expression of synthetic elements is influenced by both the number and arrangement of binding sites. This grammar plays only a small role for genomic sequences, as the relative activities of genomic sequences are best explained by the predicted occupancy of binding sites, regardless of binding site identity and positioning. Our results suggest that the effects of transcription factor binding sites (TFBS) are influenced by the order and orientation of sites, but that in the genome the overall occupancy of TFs is the primary determinant of activity. Transcription factors are proteins that flip genetic switches; their role is to control when and where genes are active. They do this by binding to short stretches of DNA called cis-regulatory sequences. Each sequence can have several binding sites for different transcription factors, but it is largely unclear whether the transcription factors binding to the same regulatory sequence actually work together. It is possible that each transcription factor may work independently and there only needs to be critical mass of transcription factors bound to throw the genetic switch. If this is the case, the most important features of a cis-regulatory sequence should be the number of binding sites it contains, and how tightly the transcription factors bind to those sites. The more transcription factors and the more strongly they bind, the more active the gene should be. An alternative option is that certain transcription factors may work better together, enhancing each other's effects such that the total effect is more than the sum of its parts. If this is true, the order, orientation and spacing of the binding sites within a sequence should matter more than the number. One way to investigate to distinguish between these possibilities is to study mouse embryonic stem cells, which have a core set of four transcription factors. Looking directly at a real genome, however, can be confusing and it is difficult to measure the effects of different cis-regulatory sequences because genes differ in so many other ways. To tackle this problem, King et al. created a synthetic set of cis-regulatory sequences based on the four core transcription factors found in mouse stem cells. The synthetic set had every combination of two, three or four of the binding sites, with each site either facing forwards or backwards along the DNA strand. King et al. attached each of the synthetic cis-regulatory sequences to a reporter gene to find out how well each sequence performed. This revealed that the cis-regulatory sequences with the most binding sites and the tightest binding affinities work best, suggesting that transcription factors mainly work independently. There was evidence of some interaction between some transcription factors, because, of the synthetic sequences with four binding sites, some worked better than others, and there were patterns in the most effective binding site combinations. However, these effects were small and when King et al. went on to test sequences from the real mouse genome, the most important factor by far was the number of binding sites. Synthetic libraries of DNA sequences allow researchers to examine gene regulation more clearly than is possible in real genomes. Yet this approach does have its limitations and it is impossible to capture every type of cis-regulatory sequence in one library. The next step to extend this work is to combine the two approaches, taking sequences from the real genome and manipulating them one by one. This could help to unravel the rules that govern how cis-regulatory sequences work in real cells.
Collapse
Affiliation(s)
- Dana M King
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Clarice Kit Yee Hong
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - James L Shepherdson
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - David M Granas
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Brett B Maricque
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Barak A Cohen
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| |
Collapse
|
33
|
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2019; 38:56-65. [PMID: 31792407 PMCID: PMC6954276 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 161] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
34
|
Orlomoski R, Bogle A, Loss J, Simons R, Dresch JM, Drewell RA, Spratt DE. Rapid and efficient purification of Drosophila homeodomain transcription factors for biophysical characterization. Protein Expr Purif 2019; 158:9-14. [PMID: 30738927 DOI: 10.1016/j.pep.2019.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 02/03/2019] [Indexed: 10/27/2022]
Abstract
Homeodomain transcription factors (HD TFs) are a large class of evolutionarily conserved DNA binding proteins that contain a basic 60-amino acid region required for binding to specific DNA sites. In Drosophila melanogaster, many of these HD TFs are expressed in the early embryo and control transcription of target genes in development through their interaction with cis-regulatory modules. Previous studies where some of the Drosophila HD TFs were purified required the use of strong denaturants (i.e. 6 M urea) and multiple chromatography columns, making the downstream biochemical examination of the isolated protein difficult. To circumvent these obstacles, we have developed a streamlined expression and purification protocol to produce large yields of Drosophila HD TFs. Using the HD TFs FUSHI-TARAZU (FTZ), ANTENNAPEDIA (ANTP), ABDOMINAL-A (ABD-A), ABDOMINAL-B (ABD-B), and ULTRABITHORAX (UBX) as examples, we demonstrate that our 3-day protocol involving the overexpression of His6-SUMO fusion constructs in E. coli followed by a Ni2+-IMAC, SUMO-tag cleavage with the SUMO protease Ulp1, and a heparin column purification produces pure, soluble protein in biological buffers around pH 7 in the absence of denaturants. Electrophoretic mobility shift assays (EMSA) confirm that the purified HD proteins are functional and nuclear magnetic resonance (NMR) spectra confirm that the purified HDs are well-folded. These purified HD TFs can be used in future biophysical experiments to structurally and biochemically characterize how and why these HD TFs bind to different DNA sequences and further probe how nucleotide differences contribute to TF-DNA specificity in the HD family.
Collapse
Affiliation(s)
- Rachel Orlomoski
- Gustaf H. Carlson School of Chemistry & Biochemistry, Clark University, 950 Main St, Worcester, MA, 01610, USA; Department of Biology, Clark University, 950 Main St, Worcester, MA, 01610, USA
| | - Aaron Bogle
- Gustaf H. Carlson School of Chemistry & Biochemistry, Clark University, 950 Main St, Worcester, MA, 01610, USA; Department of Biology, Clark University, 950 Main St, Worcester, MA, 01610, USA
| | - Jeanmarie Loss
- Gustaf H. Carlson School of Chemistry & Biochemistry, Clark University, 950 Main St, Worcester, MA, 01610, USA; Department of Biology, Clark University, 950 Main St, Worcester, MA, 01610, USA
| | - Rylee Simons
- Gustaf H. Carlson School of Chemistry & Biochemistry, Clark University, 950 Main St, Worcester, MA, 01610, USA; Department of Biology, Clark University, 950 Main St, Worcester, MA, 01610, USA
| | - Jacqueline M Dresch
- Department of Math & Computer Science, Clark University, 950 Main St, Worcester, MA, 01610, USA
| | - Robert A Drewell
- Department of Biology, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| | - Donald E Spratt
- Gustaf H. Carlson School of Chemistry & Biochemistry, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| |
Collapse
|
35
|
Bentovim L, Harden TT, DePace AH. Transcriptional precision and accuracy in development: from measurements to models and mechanisms. Development 2017; 144:3855-3866. [PMID: 29089359 PMCID: PMC5702068 DOI: 10.1242/dev.146563] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
During development, genes are transcribed at specific times, locations and levels. In recent years, the emergence of quantitative tools has significantly advanced our ability to measure transcription with high spatiotemporal resolution in vivo. Here, we highlight recent studies that have used these tools to characterize transcription during development, and discuss the mechanisms that contribute to the precision and accuracy of the timing, location and level of transcription. We attempt to disentangle the discrepancies in how physicists and biologists use the term ‘precision' to facilitate interactions using a common language. We also highlight selected examples in which the coupling of mathematical modeling with experimental approaches has provided important mechanistic insights, and call for a more expansive use of mathematical modeling to exploit the wealth of quantitative data and advance our understanding of animal transcription. Summary: This Review highlights how high-resolution quantitative tools and theoretical models have formed our current view of the mechanisms determining precision and accuracy in the timing, location and level of transcription in the Drosophila embryo.
Collapse
Affiliation(s)
- Lital Bentovim
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Timothy T Harden
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
36
|
Brown AJ, Gibson SJ, Hatton D, James DC. In silico design of context-responsive mammalian promoters with user-defined functionality. Nucleic Acids Res 2017; 45:10906-10919. [PMID: 28977454 PMCID: PMC5737543 DOI: 10.1093/nar/gkx768] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 08/22/2017] [Indexed: 12/19/2022] Open
Abstract
Comprehensive de novo-design of complex mammalian promoters is restricted by unpredictable combinatorial interactions between constituent transcription factor regulatory elements (TFREs). In this study, we show that modular binding sites that do not function cooperatively can be identified by analyzing host cell transcription factor expression profiles, and subsequently testing cognate TFRE activities in varying homotypic and heterotypic promoter architectures. TFREs that displayed position-insensitive, additive function within a specific expression context could be rationally combined together in silico to create promoters with highly predictable activities. As TFRE order and spacing did not affect the performance of these TFRE-combinations, compositions could be specifically arranged to preclude the formation of undesirable sequence features. This facilitated simple in silico-design of promoters with context-required, user-defined functionalities. To demonstrate this, we de novo-created promoters for biopharmaceutical production in CHO cells that exhibited precisely designed activity dynamics and long-term expression-stability, without causing observable retroactive effects on cellular performance. The design process described can be utilized for applications requiring context-responsive, customizable promoter function, particularly where co-expression of synthetic TFs is not suitable. Although the synthetic promoter structure utilized does not closely resemble native mammalian architectures, our findings also provide additional support for a flexible billboard model of promoter regulation.
Collapse
Affiliation(s)
- Adam J Brown
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin St., Sheffield S1 3JD, UK
| | - Suzanne J Gibson
- Biopharmaceutical Development, MedImmune, Cambridge CB21 6GH, UK
| | - Diane Hatton
- Biopharmaceutical Development, MedImmune, Cambridge CB21 6GH, UK
| | - David C James
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin St., Sheffield S1 3JD, UK
| |
Collapse
|
37
|
Khoueiry P, Girardot C, Ciglar L, Peng PC, Gustafson EH, Sinha S, Furlong EE. Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity. eLife 2017; 6. [PMID: 28792889 PMCID: PMC5550276 DOI: 10.7554/elife.28440] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 07/21/2017] [Indexed: 12/15/2022] Open
Abstract
Sequence variation within enhancers plays a major role in both evolution and disease, yet its functional impact on transcription factor (TF) occupancy and enhancer activity remains poorly understood. Here, we assayed the binding of five essential TFs over multiple stages of embryogenesis in two distant Drosophila species (with 1.4 substitutions per neutral site), identifying thousands of orthologous enhancers with conserved or diverged combinatorial occupancy. We used these binding signatures to dissect two properties of developmental enhancers: (1) potential TF cooperativity, using signatures of co-associations and co-divergence in TF occupancy. This revealed conserved combinatorial binding despite sequence divergence, suggesting protein-protein interactions sustain conserved collective occupancy. (2) Enhancer in-vivo activity, revealing orthologous enhancers with conserved activity despite divergence in TF occupancy. Taken together, we identify enhancers with diverged motifs yet conserved occupancy and others with diverged occupancy yet conserved activity, emphasising the need to functionally measure the effect of divergence on enhancer activity.
Collapse
Affiliation(s)
- Pierre Khoueiry
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Charles Girardot
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Lucia Ciglar
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Pei-Chen Peng
- Carl R. Woese Institute of Genomic Biology, University of Illinois, Champaign, United States
| | - E Hilary Gustafson
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Saurabh Sinha
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,Carl R. Woese Institute of Genomic Biology, University of Illinois, Champaign, United States
| | - Eileen Em Furlong
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| |
Collapse
|
38
|
Vockley CM, McDowell IC, D'Ippolito AM, Reddy TE. A long-range flexible billboard model of gene activation. Transcription 2017; 8:261-267. [PMID: 28598247 DOI: 10.1080/21541264.2017.1317694] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Gene regulation is fundamentally important for the coordination of diverse biologic processes including homeostasis and responses to developmental and environmental stimuli. Transcription factor (TF) binding sites are one of the major functional subunits of gene regulation. They are arranged in cis-regulatory modules (CRMs) that can be more active than the sum of their individual effects. Recently, we described a mechanism of glucocorticoid (GC)-induced gene regulation in which the glucocorticoid receptor (GR) binds coordinately to multiple CRMs that are 10s of kilobases apart in the genome. In those results, the minority of GR binding sites appear to involve direct TF:DNA interactions. Meanwhile, other GR binding sites in a cluster interact with those direct binding sites to tune their gene regulatory activity. Here, we consider the implications of those and related results in the context of existing models of gene regulation. Based on our analyses, we propose that the billboard and regulatory grammar models of cis-regulatory element activity be expanded to consider the influence of long-range interactions between cis-regulatory modules.
Collapse
Affiliation(s)
- Christopher M Vockley
- a Department of Biostatistics & Bioinformatics , Duke University , Durham , NC , USA.,b Center for Genomic & Computational Biology , Duke University , Durham , NC , USA
| | - Ian C McDowell
- b Center for Genomic & Computational Biology , Duke University , Durham , NC , USA.,c Program in Computational Biology & Bioinformatics , Duke University , Durham , NC , USA
| | - Antony M D'Ippolito
- b Center for Genomic & Computational Biology , Duke University , Durham , NC , USA.,d University Program in Genetics & Genomics, Duke University , Durham , NC , USA
| | - Timothy E Reddy
- a Department of Biostatistics & Bioinformatics , Duke University , Durham , NC , USA.,b Center for Genomic & Computational Biology , Duke University , Durham , NC , USA
| |
Collapse
|
39
|
Enhancer decommissioning by Snail1-induced competitive displacement of TCF7L2 and down-regulation of transcriptional activators results in EPHB2 silencing. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1859:1353-1367. [PMID: 27504909 DOI: 10.1016/j.bbagrm.2016.08.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 07/25/2016] [Accepted: 08/04/2016] [Indexed: 12/20/2022]
Abstract
Transcriptional silencing is a major cause for the inactivation of tumor suppressor genes, however, the underlying mechanisms are only poorly understood. The EPHB2 gene encodes a receptor tyrosine kinase that controls epithelial cell migration and allocation in intestinal crypts. Through its ability to restrict cell spreading, EPHB2 functions as a tumor suppressor in colorectal cancer whose expression is frequently lost as tumors progress to the carcinoma stage. Previously we reported that EPHB2 expression depends on a transcriptional enhancer whose activity is diminished in EPHB2 non-expressing cells. Here we investigated the mechanisms that lead to EPHB2 enhancer inactivation. We show that expression of EPHB2 and SNAIL1 - an inducer of epithelial-mesenchymal transition (EMT) - is anti-correlated in colorectal cancer cell lines and tumors. In a cellular model of Snail1-induced EMT, we observe that features of active chromatin at the EPHB2 enhancer are diminished upon expression of murine Snail1. We identify the transcription factors FOXA1, MYB, CDX2 and TCF7L2 as EPHB2 enhancer factors and demonstrate that Snail1 indirectly inactivates the EPHB2 enhancer by downregulation of FOXA1 and MYB. In addition, Snail1 induces the expression of Lymphoid enhancer factor 1 (LEF1) which competitively displaces TCF7L2 from the EPHB2 enhancer. In contrast to TCF7L2, however, LEF1 appears to repress the EPHB2 enhancer. Our findings underscore the importance of transcriptional enhancers for gene regulation under physiological and pathological conditions and show that SNAIL1 employs a combinatorial mechanism to inactivate the EPHB2 enhancer based on activator deprivation and competitive displacement of transcription factors.
Collapse
|
40
|
Fiore C, Cohen BA. Interactions between pluripotency factors specify cis-regulation in embryonic stem cells. Genome Res 2016; 26:778-86. [PMID: 27197208 PMCID: PMC4889965 DOI: 10.1101/gr.200733.115] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 04/13/2016] [Indexed: 01/06/2023]
Abstract
We investigated how interactions between pluripotency transcription factors (TFs) affect cis-regulation. We created hundreds of synthetic cis-regulatory elements (CREs) comprised of combinations of binding sites for pluripotency TFs and measured their expression in mouse embryonic stem (ES) cells. A thermodynamic model that incorporates interactions between TFs explains a large portion (72%) of the variance in expression of these CREs. These interactions include three favorable heterotypic interactions between TFs. The model also predicts an unfavorable homotypic interaction between TFs, helping to explain the observation that homotypic chains of binding sites express at low levels. We further investigated the expression driven by CREs comprised of homotypic chains of KLF4 binding sites. Our results suggest that KLF homologs make unique contributions to regulation by these CREs. We conclude that a specific set of interactions between pluripotency TFs plays a large role in setting the levels of expression driven by CREs in ES cells.
Collapse
Affiliation(s)
- Chris Fiore
- Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|
41
|
Quantitatively predictable control of Drosophila transcriptional enhancers in vivo with engineered transcription factors. Nat Genet 2016; 48:292-8. [DOI: 10.1038/ng.3509] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Accepted: 01/15/2016] [Indexed: 12/13/2022]
|
42
|
Zhao B, Kokoza VA, Saha TT, Wang S, Roy S, Raikhel AS. Regulation of the gut-specific carboxypeptidase: a study using the binary Gal4/UAS system in the mosquito Aedes aegypti. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2014; 54:1-10. [PMID: 25152428 PMCID: PMC4426967 DOI: 10.1016/j.ibmb.2014.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 07/31/2014] [Accepted: 08/03/2014] [Indexed: 05/26/2023]
Abstract
Pathogen transmission by mosquitoes is tightly linked to blood feeding which, in turn, is required for egg development. Studies of these processes would greatly benefit from genetic methods, such as the binary Gal4/UAS system. The latter has been well established for model organisms, but its availability is limited for mosquitoes. The objective of this study was to develop the blood-meal-activated, gut-specific Gal4/UAS system for the yellow-fever mosquito Aedes aegypti and utilize it to investigate the regulation of gut-specific gene expression. A 1.1-kb, 5(') upstream region of the carboxypeptidase A (CP) gene was used to genetically engineer the CP-Gal4 driver mosquito line. The CP-Gal4 specifically activated the Enhanced Green Fluorescent Protein (EGFP) reporter only after blood feeding in the gut of the CP-Gal4 > UAS-EGFP female Ae. aegypti. We used this system to study the regulation of CP gene expression. In vitro treatments with either amino acids (AAs) or insulin stimulated expression of the CP-Gal4 > UAS-EGFP transgene; no effect was observed with 20-hydroxyecdysone (20E) treatments. The transgene activation by AAs and insulin was blocked by rapamycin, the inhibitor of the Target-of-Rapamycin (TOR) kinase. RNA interference (RNAi) silence of the insulin receptor (IR) reduced the expression of the CP-Gal4 > UAS-EGFP transgene. Thus, in vitro and in vivo experiments have revealed that insulin and TOR pathways control expression of the digestive enzyme CP. In contrast, 20E, the major regulator of post-blood-meal vitellogenic events in female mosquitoes, has no role in regulating the expression of this gene. This novel CP-Gal4/UAS system permits functional testing of midgut-specific genes that are involved in blood digestion and interaction with pathogens in Ae. aegypti mosquitoes.
Collapse
Affiliation(s)
- Bo Zhao
- Department of Entomology, University of California Riverside, Riverside, CA 92521, USA; Graduate Program in Genetics, Genomics and Bioinformatics, University of California Riverside, Riverside, CA 92521, USA.
| | - Vladimir A Kokoza
- Department of Entomology, University of California Riverside, Riverside, CA 92521, USA; The Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA 92521, USA.
| | - Tusar T Saha
- Department of Entomology, University of California Riverside, Riverside, CA 92521, USA; The Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA 92521, USA.
| | - Stephanie Wang
- Honors Undergraduate Program, University of California Riverside, Riverside, CA 92521, USA.
| | - Sourav Roy
- Department of Entomology, University of California Riverside, Riverside, CA 92521, USA; The Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA 92521, USA.
| | - Alexander S Raikhel
- Department of Entomology, University of California Riverside, Riverside, CA 92521, USA; The Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA 92521, USA.
| |
Collapse
|
43
|
Waardenberg AJ, Ramialison M, Bouveret R, Harvey RP. Genetic networks governing heart development. Cold Spring Harb Perspect Med 2014; 4:a013839. [PMID: 25280899 PMCID: PMC4208705 DOI: 10.1101/cshperspect.a013839] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Animal genomes contain a code for construction of the body plan from a fertilized egg. Understanding how genome information is deciphered to create the complex multilayered regulatory systems that drive organismal development, and which become altered in disease, is one of the greatest challenges in the biological sciences. The development of methods that effectively represent and communicate the complexity inherent in gene regulatory networks remains a major barrier. This review introduces the philosophy of systems biology and discusses recent progress in understanding the development of the heart at a systems biology level.
Collapse
Affiliation(s)
- Ashley J Waardenberg
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia
| | - Mirana Ramialison
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia St. Vincent's Clinical School, University of New South Wales Medicine, Kensington, New South Wales 2052, Australia Stem Cells Australia, Melbourne Brain Centre, University of Melbourne, Victoria 3010, Australia
| | - Romaric Bouveret
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia St. Vincent's Clinical School, University of New South Wales Medicine, Kensington, New South Wales 2052, Australia
| | - Richard P Harvey
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia St. Vincent's Clinical School, University of New South Wales Medicine, Kensington, New South Wales 2052, Australia School of Biotechnology and Biomolecular Sciences, University of New South Wales Faculty of Science, New South Wales 2052, Australia Stem Cells Australia, Melbourne Brain Centre, University of Melbourne, Victoria 3010, Australia
| |
Collapse
|
44
|
Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordân R, Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 2014; 39:381-99. [PMID: 25129887 DOI: 10.1016/j.tibs.2014.07.002] [Citation(s) in RCA: 367] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 07/11/2014] [Accepted: 07/15/2014] [Indexed: 12/21/2022]
Abstract
Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs based on 3D structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA-binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.
Collapse
Affiliation(s)
- Matthew Slattery
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth, MN 55812, USA; Developmental Biology Center, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Ana Carolina Dantas Machado
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Raluca Gordân
- Center for Genomic and Computational Biology, Departments of Biostatistics and Bioinformatics, Computer Science, and Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA.
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
45
|
Regulatory codewords. Nat Genet 2014; 46:801. [DOI: 10.1038/ng.3059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
46
|
A comparison of midline and tracheal gene regulation during Drosophila development. PLoS One 2014; 9:e85518. [PMID: 24465586 PMCID: PMC3896416 DOI: 10.1371/journal.pone.0085518] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/28/2013] [Indexed: 11/19/2022] Open
Abstract
Within the Drosophila embryo, two related bHLH-PAS proteins, Single-minded and Trachealess, control development of the central nervous system midline and the trachea, respectively. These two proteins are bHLH-PAS transcription factors and independently form heterodimers with another bHLH-PAS protein, Tango. During early embryogenesis, expression of Single-minded is restricted to the midline and Trachealess to the trachea and salivary glands, whereas Tango is ubiquitously expressed. Both Single-minded/Tango and Trachealess/Tango heterodimers bind to the same DNA sequence, called the CNS midline element (CME) within cis-regulatory sequences of downstream target genes. While Single-minded/Tango and Trachealess/Tango activate some of the same genes in their respective tissues during embryogenesis, they also activate a number of different genes restricted to only certain tissues. The goal of this research is to understand how these two related heterodimers bind different enhancers to activate different genes, thereby regulating the development of functionally diverse tissues. Existing data indicates that Single-minded and Trachealess may bind to different co-factors restricted to various tissues, causing them to interact with the CME only within certain sequence contexts. This would lead to the activation of different target genes in different cell types. To understand how the context surrounding the CME is recognized by different bHLH-PAS heterodimers and their co-factors, we identified and analyzed novel enhancers that drive midline and/or tracheal expression and compared them to previously characterized enhancers. In addition, we tested expression of synthetic reporter genes containing the CME flanked by different sequences. Taken together, these experiments identify elements overrepresented within midline and tracheal enhancers and suggest that sequences immediately surrounding a CME help dictate whether a gene is expressed in the midline or trachea.
Collapse
|
47
|
Novel Genetic and Molecular Tools for the Investigation and Control of Dengue Virus Transmission by Mosquitoes. CURRENT TROPICAL MEDICINE REPORTS 2014; 1:21-31. [PMID: 24693489 DOI: 10.1007/s40475-013-0007-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Aedes aegypti is the principal vector of dengue virus (DENV) throughout the tropical world. This anthropophilic mosquito species needs to be persistently infected with DENV before it can transmit the virus through its saliva to a new vertebrate host. In the mosquito, DENV is confronted with several innate immune pathways, among which RNA interference is considered the most important. The Ae. aegypti genome project opened the doors for advanced molecular studies on pathogen-vector interactions including genetic manipulation of the vector for basic research and vector control purposes. Thus, Ae. aegypti has become the primary model for studying vector competence for arboviruses at the molecular level. Here, we present recent findings regarding DENV-mosquito interactions, emphasizing how innate immune responses modulate DENV infections in Ae. aegypti. We also describe the latest advancements in genetic manipulation of Ae. aegypti and discuss how this technology can be used to investigate vector transmission of DENV at the molecular level and to control transmission of the virus in the field.
Collapse
|
48
|
Teng L, He B, Gao P, Gao L, Tan K. Discover context-specific combinatorial transcription factor interactions by integrating diverse ChIP-Seq data sets. Nucleic Acids Res 2013; 42:e24. [PMID: 24217919 PMCID: PMC3936738 DOI: 10.1093/nar/gkt1105] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Combinatorial interactions among transcription factors (TFs) are critical for integrating diverse intrinsic and extrinsic signals, fine-tuning regulatory output and increasing the robustness and plasticity of regulatory systems. Current knowledge about combinatorial regulation is rather limited due to the lack of suitable experimental technologies and bioinformatics tools. The rapid accumulation of ChIP-Seq data has provided genome-wide occupancy maps for a large number of TFs and chromatin modification marks for identifying enhancers without knowing individual TF binding sites. Integration of the two data types has not been researched extensively, resulting in underused data and missed opportunities. We describe a novel method for discovering frequent combinatorial occupancy patterns by multiple TFs at enhancers. Our method is based on probabilistic item set mining and takes into account uncertainty in both types of ChIP-Seq data. By joint analysis of 108 TFs in four human cell types, we found that cell–type-specific interactions among TFs are abundant and that the majority of enhancers have flexible architecture. We show that several families of transposable elements disproportionally overlap with enhancers with combinatorial patterns, suggesting that these transposable element families play an important role in the evolution of combinatorial regulation.
Collapse
Affiliation(s)
- Li Teng
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA, Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | | | | | | | | |
Collapse
|
49
|
Harmston N, Baresic A, Lenhard B. The mystery of extreme non-coding conservation. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130021. [PMID: 24218634 PMCID: PMC3826495 DOI: 10.1098/rstb.2013.0021] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regions of several dozen to several hundred base pairs of extreme conservation have been found in non-coding regions in all metazoan genomes. The distribution of these elements within and across genomes has suggested that many have roles as transcriptional regulatory elements in multi-cellular organization, differentiation and development. Currently, there is no known mechanism or function that would account for this level of conservation at the observed evolutionary distances. Previous studies have found that, while these regions are under strong purifying selection, and not mutational coldspots, deletion of entire regions in mice does not necessarily lead to identifiable changes in phenotype during development. These opposing findings lead to several questions regarding their functional importance and why they are under strong selection in the first place. In this perspective, we discuss the methods and techniques used in identifying and dissecting these regions, their observed patterns of conservation, and review the current hypotheses on their functional significance.
Collapse
Affiliation(s)
- Nathan Harmston
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London and MRC Clinical Sciences Centre, , Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | | | | |
Collapse
|
50
|
Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat Genet 2013; 45:1021-1028. [PMID: 23892608 DOI: 10.1038/ng.2713] [Citation(s) in RCA: 173] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 06/28/2013] [Indexed: 12/12/2022]
Abstract
Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ∼5,000 synthetic regulatory elements containing patterns from 12 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers.
Collapse
|