1
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
2
|
Geisberg JV, Moqtaderi Z, Struhl K. Chromatin regulates alternative polyadenylation via the RNA polymerase II elongation rate. Proc Natl Acad Sci U S A 2024; 121:e2405827121. [PMID: 38748572 PMCID: PMC11127049 DOI: 10.1073/pnas.2405827121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
The RNA polymerase II (Pol II) elongation rate influences poly(A) site selection, with slow and fast Pol II derivatives causing upstream and downstream shifts, respectively, in poly(A) site utilization. In yeast, depletion of either of the histone chaperones FACT or Spt6 causes an upstream shift of poly(A) site use that strongly resembles the poly(A) profiles of slow Pol II mutant strains. Like slow Pol II mutant strains, FACT- and Spt6-depleted cells exhibit Pol II processivity defects, indicating that both Spt6 and FACT stimulate the Pol II elongation rate. Poly(A) profiles of some genes show atypical downstream shifts; this subset of genes overlaps well for FACT- or Spt6-depleted strains but is different from the atypical genes in Pol II speed mutant strains. In contrast, depletion of histone H3 or H4 causes a downstream shift of poly(A) sites for most genes, indicating that nucleosomes inhibit the Pol II elongation rate in vivo. Thus, chromatin-based control of the Pol II elongation rate is a potential mechanism, distinct from direct effects on the cleavage/polyadenylation machinery, to regulate alternative polyadenylation in response to genetic or environmental changes.
Collapse
Affiliation(s)
- Joseph V. Geisberg
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA02115
| | - Zarmik Moqtaderi
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA02115
| | - Kevin Struhl
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA02115
| |
Collapse
|
3
|
Karollus A, Hingerl J, Gankin D, Grosshauser M, Klemon K, Gagneur J. Species-aware DNA language models capture regulatory elements and their evolution. Genome Biol 2024; 25:83. [PMID: 38566111 PMCID: PMC10985990 DOI: 10.1186/s13059-024-03221-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.
Collapse
Affiliation(s)
- Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Johannes Hingerl
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Dennis Gankin
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Martin Grosshauser
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Kristian Klemon
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Center for Machine Learning, Munich, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
4
|
Zhou J, Li QQ. Stress responses of plants through transcriptome plasticity by mRNA alternative polyadenylation. MOLECULAR HORTICULTURE 2023; 3:19. [PMID: 37789388 PMCID: PMC10536700 DOI: 10.1186/s43897-023-00066-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/07/2023] [Indexed: 10/05/2023]
Abstract
The sessile nature of plants confines their responsiveness to changing environmental conditions. Gene expression regulation becomes a paramount mechanism for plants to adjust their physiological and morphological behaviors. Alternative polyadenylation (APA) is known for its capacity to augment transcriptome diversity and plasticity, thereby furnishing an additional set of tools for modulating gene expression. APA has also been demonstrated to exhibit intimate associations with plant stress responses. In this study, we review APA dynamic features and consequences in plants subjected to both biotic and abiotic stresses. These stresses include adverse environmental stresses, and pathogenic attacks, such as cadmium toxicity, high salt, hypoxia, oxidative stress, cold, heat shock, along with bacterial, fungal, and viral infections. We analyzed the overarching research framework employed to elucidate plant APA response and the alignment of polyadenylation site transitions with the modulation of gene expression levels within the ambit of each stress condition. We also proposed a general APA model where transacting factors, including poly(A) factors, epigenetic regulators, RNA m6A modification factors, and phase separation proteins, assume pivotal roles in APA related transcriptome plasticity during stress response in plants.
Collapse
Affiliation(s)
- Jiawen Zhou
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, 361102, Fujian, China
| | - Qingshun Quinn Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, 361102, Fujian, China.
- Biomedical Sciences, College of Dental Medicine, Western University of Health Sciences, Pomona, CA, 91766, USA.
| |
Collapse
|
5
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|
6
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 122] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
7
|
Savinov A, Brandsen BM, Angell BE, Cuperus JT, Fields S. Effects of sequence motifs in the yeast 3' untranslated region determined from massively parallel assays of random sequences. Genome Biol 2021; 22:293. [PMID: 34663436 PMCID: PMC8522215 DOI: 10.1186/s13059-021-02509-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 09/30/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The 3' untranslated region (UTR) plays critical roles in determining the level of gene expression through effects on activities such as mRNA stability and translation. Functional elements within this region have largely been identified through analyses of native genes, which contain multiple co-evolved sequence features. RESULTS To explore the effects of 3' UTR sequence elements outside of native sequence contexts, we analyze hundreds of thousands of random 50-mers inserted into the 3' UTR of a reporter gene in the yeast Saccharomyces cerevisiae. We determine relative protein expression levels from the fitness of transformants in a growth selection. We find that the consensus 3' UTR efficiency element significantly boosts expression, independent of sequence context; on the other hand, the consensus positioning element has only a small effect on expression. Some sequence motifs that are binding sites for Puf proteins substantially increase expression in the library, despite these proteins generally being associated with post-transcriptional downregulation of native mRNAs. Our measurements also allow a systematic examination of the effects of point mutations within efficiency element motifs across diverse sequence backgrounds. These mutational scans reveal the relative in vivo importance of individual bases in the efficiency element, which likely reflects their roles in binding the Hrp1 protein involved in cleavage and polyadenylation. CONCLUSIONS The regulatory effects of some 3' UTR sequence features, like the efficiency element, are consistent regardless of sequence context. In contrast, the consequences of other 3' UTR features appear to be strongly dependent on their evolved context within native genes.
Collapse
Affiliation(s)
- Andrew Savinov
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Present address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Benjamin M Brandsen
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Department of Chemistry and Biochemistry, Creighton University, Omaha, NE, 68178, USA
| | - Brooke E Angell
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA
- Present address: Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL, 60208, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA.
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA, 98195, USA.
- Department of Medicine, University of Washington, Box 357720, Seattle, WA, 98195, USA.
| |
Collapse
|
8
|
Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021; 184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Collapse
Affiliation(s)
- Dustin Griesemer
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kalki Kukreja
- Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
| | - Joe R Davis
- BigHat Biosciences, San Carlos, CA 94070, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David K Yang
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
| | - Mehmet H Guney
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Carl D Novina
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
9
|
A broad analysis of splicing regulation in yeast using a large library of synthetic introns. PLoS Genet 2021; 17:e1009805. [PMID: 34570750 PMCID: PMC8496845 DOI: 10.1371/journal.pgen.1009805] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 10/07/2021] [Accepted: 09/03/2021] [Indexed: 11/19/2022] Open
Abstract
RNA splicing is a key process in eukaryotic gene expression, in which an intron is spliced out of a pre-mRNA molecule to eventually produce a mature mRNA. Most intron-containing genes are constitutively spliced, hence efficient splicing of an intron is crucial for efficient regulation of gene expression. Here we use a large synthetic oligo library of ~20,000 variants to explore how different intronic sequence features affect splicing efficiency and mRNA expression levels in S. cerevisiae. Introns are defined by three functional sites, the 5’ donor site, the branch site, and the 3’ acceptor site. Using a combinatorial design of synthetic introns, we demonstrate how non-consensus splice site sequences in each of these sites affect splicing efficiency. We then show that S. cerevisiae splicing machinery tends to select alternative 3’ splice sites downstream of the original site, and we suggest that this tendency created a selective pressure, leading to the avoidance of cryptic splice site motifs near introns’ 3’ ends. We further use natural intronic sequences from other yeast species, whose splicing machineries have diverged to various extents, to show how intron architectures in the various species have been adapted to the organism’s splicing machinery. We suggest that the observed tendency for cryptic splicing is a result of a loss of a specific splicing factor, U2AF1. Lastly, we show that synthetic sequences containing two introns give rise to alternative RNA isoforms in S. cerevisiae, demonstrating that merely a synthetic fusion of two introns might be suffice to facilitate alternative splicing in yeast. Our study reveals novel mechanisms by which introns are shaped in evolution to allow cells to regulate their transcriptome. In addition, it provides a valuable resource to study the regulation of constitutive and alternative splicing in a model organism. RNA splicing is a process in which parts of a new pre-mRNA are spliced out of the mRNA molecule to produce eventually a mature mRNA. Those RNA segments that are spliced out are termed introns, and they are found in most genes in eukaryotic organisms. Hence regulation of this process has a major role in the control of gene expression. The budding yeast S. cerevisiae is a popular model organism for eukaryotic cell biology, but in terms of splicing it differs, as it has only few intron-containing genes. Nevertheless, this species has been used to study basic principles of splicing regulation based on its ~300 introns. Here we used the technology of a large synthetic genetic library to introduce many new intron-containing genes to the yeast genome, to explore splicing regulation at a wider scope than was possible so far. Reassuringly, our results confirm known regulatory mechanisms, and further expand our understanding of splicing regulation, specifically how the yeast splicing machinery interacts with the end of introns, and how through evolution introns have evolved to avoid unwanted misidentifications of this end. We further demonstrate the potential of the yeast splicing machinery to alternatively splice a two-intron gene, which is common in other eukaryotes but rare in yeast. Our work presents a first-of-its-kind resource for the systematic study of splicing in live cells.
Collapse
|
10
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
11
|
Abstract
The stability of RNA transcripts is regulated by signals within their sequences, but the identity of those signals still remain elusive in many biological systems. Recently introduced massively parallel tools for the analysis of regulatory RNA sequences provide the ability to detect functional cis-regulatory sequences of post-transcriptional RNA regulation at a much larger scale and resolution than before. Their application formulates the underlying sequence-based rules and predicts the impact of genetic variations. Here, we describe the application of UTR-Seq, as a strategy to uncover cis-regulatory signals of RNA stability during early zebrafish embryogenesis. The method combines massively parallel reporter assays (MPRA) with computational regression models. It surveys the effect of tens of thousands of regulatory sequences on RNA stability and analyzes the results via regression models to identify sequence signals that impact RNA stability and to predict the in vivo effect of sequence variations.
Collapse
|
12
|
Zrimec J, Börlin CS, Buric F, Muhammad AS, Chen R, Siewers V, Verendel V, Nielsen J, Töpel M, Zelezniak A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun 2020; 11:6141. [PMID: 33262328 PMCID: PMC7708451 DOI: 10.1038/s41467-020-19921-4] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/02/2020] [Indexed: 12/31/2022] Open
Abstract
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Azam Sheikh Muhammad
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Rhongzen Chen
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Vilhelm Verendel
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Mats Töpel
- Department of Marine Sciences, University of Gothenburg, Box 461, SE-405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530, Gothenburg, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
- Science for Life Laboratory, Tomtebodavägen 23a, SE-171 65, Stockholm, Sweden.
| |
Collapse
|
13
|
Brion C, Lutz SM, Albert FW. Simultaneous quantification of mRNA and protein in single cells reveals post-transcriptional effects of genetic variation. eLife 2020; 9:60645. [PMID: 33191917 PMCID: PMC7707838 DOI: 10.7554/elife.60645] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 11/14/2020] [Indexed: 01/27/2023] Open
Abstract
Trans-acting DNA variants may specifically affect mRNA or protein levels of genes located throughout the genome. However, prior work compared trans-acting loci mapped in separate studies, many of which had limited statistical power. Here, we developed a CRISPR-based system for simultaneous quantification of mRNA and protein of a given gene via dual fluorescent reporters in single, live cells of the yeast Saccharomyces cerevisiae. In large populations of recombinant cells from a cross between two genetically divergent strains, we mapped 86 trans-acting loci affecting the expression of ten genes. Less than 20% of these loci had concordant effects on mRNA and protein of the same gene. Most loci influenced protein but not mRNA of a given gene. One locus harbored a premature stop variant in the YAK1 kinase gene that had specific effects on protein or mRNA of dozens of genes. These results demonstrate complex, post-transcriptional genetic effects on gene expression.
Collapse
Affiliation(s)
- Christian Brion
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| | - Sheila M Lutz
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| | - Frank Wolfgang Albert
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, United States
| |
Collapse
|
14
|
Renganaath K, Chong R, Day L, Kosuri S, Kruglyak L, Albert FW. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 2020; 9:e62669. [PMID: 33179598 PMCID: PMC7685706 DOI: 10.7554/elife.62669] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 11/11/2020] [Indexed: 02/06/2023] Open
Abstract
Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Collapse
Affiliation(s)
- Kaushik Renganaath
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| | - Rockie Chong
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Laura Day
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Sriram Kosuri
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Frank W Albert
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| |
Collapse
|
15
|
Lee JW, Lee MW, Ha JS, Kim DS, Jin E, Lee HG, Oh HM. Development of a species-specific transformation system using the novel endogenous promoter calreticulin from oleaginous microalgae Ettlia sp. Sci Rep 2020; 10:13947. [PMID: 32811857 PMCID: PMC7434781 DOI: 10.1038/s41598-020-70503-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 07/30/2020] [Indexed: 12/22/2022] Open
Abstract
Microalgae not only serve as raw materials for biofuel but also have uses in the food, pharmaceutical, and cosmetic industries. However, regulated gene expression in microalgae has only been achieved in a few strains due to the lack of genome information and unstable transformation. This study developed a species-specific transformation system for an oleaginous microalga, Ettlia sp. YC001, using electroporation. The electroporation was optimized using three parameters (waveform, field strength, and number of pulses), and the final selection was a 5 kV cm-1 field strength using an exponential decay wave with one pulse. A new strong endogenous promoter CRT (Pcrt) was identified using transcriptome and quantitative PCR analysis of highly expressed genes during the late exponential growth phase. The activities of this promoter were characterized using a codon optimized cyan fluorescent protein (CFP) as a reporter. The expression of CFP was similar under Pcrt and under the constitutive promoter psaD (PpsaD). The developed transformation system using electroporation with the endogenous promoter is simple to prepare, is easy to operate with high repetition, and utilizes a species-specific vector for high expression. This system could be used not only in molecular studies on microalgae but also in various industrial applications of microalgae.
Collapse
Affiliation(s)
- Jun-Woo Lee
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Min-Woo Lee
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
- Department of Environmental Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea
| | - Ji-San Ha
- Department of Biological Sciences, Sungkyunkwan University, Suwon, Republic of Korea
| | - Dae-Soo Kim
- Rare Disease Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - EonSeon Jin
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Hyung-Gwan Lee
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea.
- Department of Environmental Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea.
| | - Hee-Mock Oh
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea.
- Department of Environmental Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea.
| |
Collapse
|
16
|
Transcriptional control of gene expression in Pichia pastoris by manipulation of terminators. Appl Microbiol Biotechnol 2020; 104:7841-7851. [DOI: 10.1007/s00253-020-10785-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/03/2020] [Accepted: 07/13/2020] [Indexed: 12/11/2022]
|
17
|
Ipa1 Is an RNA Polymerase II Elongation Factor that Facilitates Termination by Maintaining Levels of the Poly(A) Site Endonuclease Ysh1. Cell Rep 2020; 26:1919-1933.e5. [PMID: 30759400 PMCID: PMC7236606 DOI: 10.1016/j.celrep.2019.01.051] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 12/05/2018] [Accepted: 01/15/2019] [Indexed: 02/08/2023] Open
Abstract
The yeast protein Ipa1 was recently discovered to interact with the Ysh1
endonuclease of the prem-RNA cleavage and polyadenylation (C/P) machinery, and
Ipa1 mutation impairs 3′end processing. We report that Ipa1 globally
promotes proper transcription termination and poly(A) site selection, but with
variable effects on genes depending upon the specific configurations of
polyadenylation signals. Our findings suggest that the role of Ipa1 in
termination is mediated through interaction with Ysh1, since Ipa1 mutation leads
to decrease in Ysh1 and poor recruitment of the C/P complex to a transcribed
gene. The Ipa1 association with transcriptionally active chromatin resembles
that of elongation factors, and the mutant shows defective Pol II elongation
kinetics in vivo. Ysh1 overexpression in the Ipa1 mutant
rescues the termination defect, but not the mutant’s sensitivity to
6-azauracil, an indicator of defective elongation. Our findings support a model
in which an Ipa1/Ysh1 complex helps coordinate transcription elongation and
3′ end processing. The essential, uncharacterized Ipa1 protein was recently discovered to
interact with the Ysh1 endonuclease of the pre-mRNA cleavage and polyadenylation
machinery. Pearson et al. propose that the Ipa1/Ysh1 interaction provides the
cell with a means to coordinate and regulate transcription elongation with
3′ end processing in accordance with the cell’s needs.
Collapse
|
18
|
de Jongh RP, van Dijk AD, Julsing MK, Schaap PJ, de Ridder D. Designing Eukaryotic Gene Expression Regulation Using Machine Learning. Trends Biotechnol 2020; 38:191-201. [DOI: 10.1016/j.tibtech.2019.07.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/12/2019] [Accepted: 07/19/2019] [Indexed: 12/11/2022]
|
19
|
Jia XJ, Du Y, Jiang HJ, Li YZ, Xu YN, Si SY, Wang L, Hong B. Identification of Novel Compounds Enhancing SR-BI mRNA Stability through High-Throughput Screening. SLAS DISCOVERY 2019; 25:397-408. [PMID: 31858876 DOI: 10.1177/2472555219894543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Atherosclerosis is the pathological basis of most cardiovascular diseases. Reverse cholesterol transport (RCT) is a main mechanism of cholesterol homeostasis and involves the direct transport of high-density lipoprotein (HDL) cholesteryl ester by selective cholesterol uptake. Hepatic scavenger receptor class B member 1 (SR-BI) overexpression can effectively promote RCT and reduce atherosclerosis. SR-BI may be an important target for prevention or treatment of atherosclerotic disease. In our study, we inserted human SR-BI mRNA 3' untranslated region (3'UTR) downstream of the luciferase reporter gene, to establish a high-throughput screening model based on stably transfected HepG2 cells and to screen small-molecule compounds that can significantly enhance the mRNA stability of the SR-BI gene. Through multiple screenings of 25 755 compounds, the top five active compounds that have similar structures were obtained, with a positive rate of 0.19%. The five positive compounds could enhance the SR-BI expression and uptake of DiI-HDL in the hepatocyte HepG2. E238B-63 could also effectively extend the half-life of SR-BI mRNA and enhance the SR-BI mRNA and protein level and the uptake of DiI-HDL in hepatocytes in a time-dependent and dose-dependent manner. The structure-activity relationship analysis showed that the structure N-(3-hydroxy-2-pyridyl) carboxamide is possibly the key pharmacophore of the active compound, providing reference for acquiring candidate compounds with better activity. The positive small molecular compounds obtained in this study might become new drug candidates or lead compounds for the treatment of cardiovascular diseases and contribute to the further study of the posttranscriptional regulation mechanism of the SR-BI gene.
Collapse
Affiliation(s)
- Xiao-Jian Jia
- Shenzhen Kangning Hospital & Shenzhen Mental Health Center, Shenzhen University Health Science Center, Shenzhen, PR China.,NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Yu Du
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Hua-Jun Jiang
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Yong-Zhen Li
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Yan-Ni Xu
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Shu-Yi Si
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Li Wang
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Bin Hong
- NHC Key Laboratory of Biotechnology of Antibiotics, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| |
Collapse
|
20
|
Schikora-Tamarit MÀ, Lopez-Grado I Salinas G, Gonzalez-Navasa C, Calderón I, Marcos-Fa X, Sas M, Carey LB. Promoter Activity Buffering Reduces the Fitness Cost of Misregulation. Cell Rep 2019; 24:755-765. [PMID: 30021171 DOI: 10.1016/j.celrep.2018.06.059] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 05/04/2018] [Accepted: 06/14/2018] [Indexed: 01/21/2023] Open
Abstract
Organisms regulate gene expression through changes in the activity of transcription factors (TFs). In yeast, the response of genes to changes in TF activity is generally assumed to be encoded in the promoter. To directly test this assumption, we chose 42 genes and, for each, replaced the promoter with a synthetic inducible promoter and measured how protein expression changes as a function of TF activity. Most genes exhibited gene-specific TF dose-response curves not due to differences in mRNA stability, translation, or protein stability. Instead, most genes have an intrinsic ability to buffer the effects of promoter activity. This can be encoded in the open reading frame and the 3' end of genes and can be implemented by both autoregulatory feedback and by titration of limiting trans regulators. We show experimentally and computationally that, when misexpression of a gene is deleterious, this buffering insulates cells from fitness defects due to misregulation.
Collapse
Affiliation(s)
- Miquel Àngel Schikora-Tamarit
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Lopez-Grado I Salinas
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Carolina Gonzalez-Navasa
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Irene Calderón
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Xavi Marcos-Fa
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Miquel Sas
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Lucas B Carey
- Systems Bioengineering Program, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003 Barcelona, Spain.
| |
Collapse
|
21
|
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2019; 38:56-65. [PMID: 31792407 PMCID: PMC6954276 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 159] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
22
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
23
|
Vainberg Slutskin I, Weinberger A, Segal E. Sequence determinants of polyadenylation-mediated regulation. Genome Res 2019; 29:1635-1647. [PMID: 31530582 PMCID: PMC6771402 DOI: 10.1101/gr.247312.118] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 08/13/2019] [Indexed: 12/31/2022]
Abstract
The cleavage and polyadenylation reaction is a crucial step in transcription termination and pre-mRNA maturation in human cells. Despite extensive research, the encoding of polyadenylation-mediated regulation of gene expression within the DNA sequence is not well understood. Here, we utilized a massively parallel reporter assay to inspect the effect of over 12,000 rationally designed polyadenylation sequences (PASs) on reporter gene expression and cleavage efficiency. We find that the PAS sequence can modulate gene expression by over five orders of magnitude. By using a uniquely designed scanning mutagenesis data set, we gain mechanistic insight into various modes of action by which the cleavage efficiency affects the sensitivity or robustness of the PAS to mutation. Furthermore, we employ motif discovery to identify both known and novel sequence motifs associated with PAS-mediated regulation. By leveraging the large scale of our data, we train a deep learning model for the highly accurate prediction of RNA levels from DNA sequence alone (R = 0.83). Moreover, we devise unique approaches for predicting exact cleavage sites for our reporter constructs and for endogenous transcripts. Taken together, our results expand our understanding of PAS-mediated regulation, and provide an unprecedented resource for analyzing and predicting PAS for regulatory genomics applications.
Collapse
Affiliation(s)
- Ilya Vainberg Slutskin
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
24
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
25
|
Qiu C, Kaplan CD. Functional assays for transcription mechanisms in high-throughput. Methods 2019; 159-160:115-123. [PMID: 30797033 PMCID: PMC6589137 DOI: 10.1016/j.ymeth.2019.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/18/2019] [Indexed: 01/12/2023] Open
Abstract
Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA, or RNA products of transcription, can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.
Collapse
Affiliation(s)
- Chenxi Qiu
- Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
26
|
Baeza-Centurion P, Miñana B, Schmiedel JM, Valcárcel J, Lehner B. Combinatorial Genetics Reveals a Scaling Law for the Effects of Mutations on Splicing. Cell 2019; 176:549-563.e23. [PMID: 30661752 DOI: 10.1016/j.cell.2018.12.010] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/29/2018] [Accepted: 12/07/2018] [Indexed: 02/08/2023]
Abstract
Despite a wealth of molecular knowledge, quantitative laws for accurate prediction of biological phenomena remain rare. Alternative pre-mRNA splicing is an important regulated step in gene expression frequently perturbed in human disease. To understand the combined effects of mutations during evolution, we quantified the effects of all possible combinations of exonic mutations accumulated during the emergence of an alternatively spliced human exon. This revealed that mutation effects scale non-monotonically with the inclusion level of an exon, with each mutation having maximum effect at a predictable intermediate inclusion level. This scaling is observed genome-wide for cis and trans perturbations of splicing, including for natural and disease-associated variants. Mathematical modeling suggests that competition between alternative splice sites is sufficient to cause this non-linearity in the genotype-phenotype map. Combining the global scaling law with specific pairwise interactions between neighboring mutations allows accurate prediction of the effects of complex genotype changes involving >10 mutations.
Collapse
Affiliation(s)
- Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Belén Miñana
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Jörn M Schmiedel
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Juan Valcárcel
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| |
Collapse
|
27
|
Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E. Systematic interrogation of human promoters. Genome Res 2019; 29:171-183. [PMID: 30622120 PMCID: PMC6360817 DOI: 10.1101/gr.236075.118] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 12/05/2018] [Indexed: 12/19/2022]
Abstract
Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.
Collapse
Affiliation(s)
- Shira Weingarten-Gabbay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ronit Nir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Shai Lubliner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eilon Sharon
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Kalma
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
28
|
Eldarov MA, Beletsky AV, Tanashchuk TN, Kishkovskaya SA, Ravin NV, Mardanov AV. Whole-Genome Analysis of Three Yeast Strains Used for Production of Sherry-Like Wines Revealed Genetic Traits Specific to Flor Yeasts. Front Microbiol 2018; 9:965. [PMID: 29867869 PMCID: PMC5962777 DOI: 10.3389/fmicb.2018.00965] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 04/25/2018] [Indexed: 12/31/2022] Open
Abstract
Flor yeast strains represent a specialized group of Saccharomyces cerevisiae yeasts used for biological wine aging. We have sequenced the genomes of three flor strains originated from different geographic regions and used for production of sherry-like wines in Russia. According to the obtained phylogeny of 118 yeast strains, flor strains form very tight cluster adjacent to the main wine clade. SNP analysis versus available genomes of wine and flor strains revealed 2,270 genetic variants in 1,337 loci specific to flor strains. Gene ontology analysis in combination with gene content evaluation revealed a complex landscape of possibly adaptive genetic changes in flor yeast, related to genes associated with cell morphology, mitotic cell cycle, ion homeostasis, DNA repair, carbohydrate metabolism, lipid metabolism, and cell wall biogenesis. Pangenomic analysis discovered the presence of several well-known "non-reference" loci of potential industrial importance. Events of gene loss included deletions of asparaginase genes, maltose utilization locus, and FRE-FIT locus involved in iron transport. The latter in combination with a flor-yeast-specific mutation in the Aft1 transcription factor gene is likely to be responsible for the discovered phenotype of increased iron sensitivity and improved iron uptake of analyzed strains. Expansion of the coding region of the FLO11 flocullin gene and alteration of the balance between members of the FLO gene family are likely to positively affect the well-known propensity of flor strains for velum formation. Our study provides new insights in the nature of genetic variation in flor yeast strains and demonstrates that different adaptive properties of flor yeast strains could have evolved through different mechanisms of genetic variation.
Collapse
Affiliation(s)
- Mikhail A. Eldarov
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Alexey V. Beletsky
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Tatiana N. Tanashchuk
- All-Russian National Research Institute of Viticulture and Winemaking “Magarach” of the Russian Academy of Sciences, Yalta, Russia
| | - Svetlana A. Kishkovskaya
- All-Russian National Research Institute of Viticulture and Winemaking “Magarach” of the Russian Academy of Sciences, Yalta, Russia
| | - Nikolai V. Ravin
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Andrey V. Mardanov
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
29
|
Espinar L, Schikora Tamarit MÀ, Domingo J, Carey LB. Promoter architecture determines cotranslational regulation of mRNA. Genome Res 2018; 28:509-518. [PMID: 29567675 PMCID: PMC5880241 DOI: 10.1101/gr.230458.117] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 02/27/2018] [Indexed: 01/08/2023]
Abstract
Information that regulates gene expression is encoded throughout each gene but if different regulatory regions can be understood in isolation, or if they interact, is unknown. Here we measure mRNA levels for 10,000 open reading frames (ORFs) transcribed from either an inducible or constitutive promoter. We find that the strength of cotranslational regulation on mRNA levels is determined by promoter architecture. By using a novel computational genetic screen of 6402 RNA-seq experiments, we identify the RNA helicase Dbp2 as the mechanism by which cotranslational regulation is reduced specifically for inducible promoters. Finally, we find that for constitutive genes, but not inducible genes, most of the information encoding regulation of mRNA levels in response to changes in growth rate is encoded in the ORF and not in the promoter. Thus, the ORF sequence is a major regulator of gene expression, and a nonlinear interaction between promoters and ORFs determines mRNA levels.
Collapse
Affiliation(s)
- Lorena Espinar
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
| | | | - Júlia Domingo
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain.,EMBL-CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain
| | - Lucas B Carey
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
30
|
Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay. Nat Commun 2018; 9:529. [PMID: 29410437 PMCID: PMC5802814 DOI: 10.1038/s41467-018-02980-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 01/11/2018] [Indexed: 12/16/2022] Open
Abstract
Despite extensive research, the sequence features affecting microRNA-mediated regulation are not well understood, limiting our ability to predict gene expression levels in both native and synthetic sequences. Here we employed a massively parallel reporter assay to investigate the effect of over 14,000 rationally designed 3′ UTR sequences on reporter construct repression. We found that multiple factors, including microRNA identity, hybridization energy, target accessibility, and target multiplicity, can be manipulated to achieve a predictable, up to 57-fold, change in protein repression. Moreover, we predict protein repression and RNA levels with high accuracy (R = 0.84 and R = 0.80, respectively) using only 3′ UTR sequence, as well as the effect of mutation in native 3′ UTRs on protein repression (R = 0.63). Taken together, our results elucidate the effect of different sequence features on miRNA-mediated regulation and demonstrate the predictability of their effect on gene expression with applications in regulatory genomics and synthetic biology. MiRNAs are known regulators of gene expression. Here the authors perform a large-scale massively parallel reporter assay to investigate the effect of a large number of designed 3′ UTR sequences on reporter expression and asses how miRNA regulatory elements features affect miRNA mediated repression.
Collapse
|
31
|
Rabani M, Pieper L, Chew GL, Schier AF. A Massively Parallel Reporter Assay of 3' UTR Sequences Identifies In Vivo Rules for mRNA Degradation. Mol Cell 2017; 68:1083-1094.e5. [PMID: 29225039 DOI: 10.1016/j.molcel.2017.11.014] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Revised: 10/13/2017] [Accepted: 11/10/2017] [Indexed: 12/31/2022]
Abstract
The stability of mRNAs is regulated by signals within their sequences, but a systematic and predictive understanding of the underlying sequence rules remains elusive. Here we introduce UTR-seq, a combination of massively parallel reporter assays and regression models, to survey the dynamics of tens of thousands of 3' UTR sequences during early zebrafish embryogenesis. UTR-seq revealed two temporal degradation programs: a maternally encoded early-onset program and a late-onset program that accelerated degradation after zygotic genome activation. Three signals regulated early-onset rates: stabilizing poly-U and UUAG sequences and destabilizing GC-rich signals. Three signals explained late-onset degradation: miR-430 seeds, AU-rich sequences, and Pumilio recognition sites. Sequence-based regression models translated 3' UTRs into their unique decay patterns and predicted the in vivo effect of sequence signals on mRNA stability. Their application led to the successful design of artificial 3' UTRs that conferred specific mRNA dynamics. UTR-seq provides a general strategy to uncover the rules of RNA cis regulation.
Collapse
Affiliation(s)
- Michal Rabani
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Lindsey Pieper
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Guo-Liang Chew
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Alexander F Schier
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA; Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; The Broad Institute, Cambridge, MA 02140, USA.
| |
Collapse
|
32
|
Sanfilippo P, Wen J, Lai EC. Landscape and evolution of tissue-specific alternative polyadenylation across Drosophila species. Genome Biol 2017; 18:229. [PMID: 29191225 PMCID: PMC5707805 DOI: 10.1186/s13059-017-1358-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/08/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Drosophila melanogaster has one of best-described transcriptomes of any multicellular organism. Nevertheless, the paucity of 3'-sequencing data in this species precludes comprehensive assessment of alternative polyadenylation (APA), which is subject to broad tissue-specific control. RESULTS Here, we generate deep 3'-sequencing data from 23 developmental stages, tissues, and cell lines of D. melanogaster, yielding a comprehensive atlas of ~ 62,000 polyadenylated ends. These data broadly extend the annotated transcriptome, identify ~ 40,000 novel 3' termini, and reveal that two-thirds of Drosophila genes are subject to APA. Furthermore, we dramatically expand the numbers of genes known to be subject to tissue-specific APA, such as 3' untranslated region (UTR) lengthening in head and 3' UTR shortening in testis, and characterize new tissue and developmental 3' UTR patterns. Our thorough 3' UTR annotations permit reassessment of post-transcriptional regulatory networks, via conserved miRNA and RNA binding protein sites. To evaluate the evolutionary conservation and divergence of APA patterns, we generate developmental and tissue-specific 3'-seq libraries from Drosophila yakuba and Drosophila virilis. We document broadly analogous tissue-specific APA trends in these species, but also observe significant alterations in 3' end usage across orthologs. We exploit the population of functionally evolving poly(A) sites to gain clear evidence that evolutionary divergence in core polyadenylation signal (PAS) and downstream sequence element (DSE) motifs drive broad alterations in 3' UTR isoform expression across the Drosophila phylogeny. CONCLUSIONS These data provide a critical resource for the Drosophila community and offer many insights into the complex control of alternative tissue-specific 3' UTR formation and its consequences for post-transcriptional regulatory networks.
Collapse
Affiliation(s)
- Piero Sanfilippo
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA
- Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA
| | - Jiayu Wen
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA
- Present address: Biochemistry and Biomedical Sciences, Research School of Biology, ANU College of Science, The Australian National University, Canberra, ACT 2601, Australia
| | - Eric C Lai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA.
- Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA.
| |
Collapse
|
33
|
Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, Seelig G. Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences. Genome Res 2017; 27:2015-2024. [PMID: 29097404 PMCID: PMC5741052 DOI: 10.1101/gr.224964.117] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 10/18/2017] [Indexed: 11/25/2022]
Abstract
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.
Collapse
Affiliation(s)
- Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Benjamin Groves
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Anna Kuchina
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Alexander B Rosenberg
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA
| | | | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.,Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Georg Seelig
- Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA.,Department of Computer Science & Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
34
|
Cheng J, Maier KC, Avsec Ž, Rus P, Gagneur J. Cis-regulatory elements explain most of the mRNA stability variation across genes in yeast. RNA (NEW YORK, N.Y.) 2017; 23:1648-1659. [PMID: 28802259 PMCID: PMC5648033 DOI: 10.1261/rna.062224.117] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 07/31/2017] [Indexed: 05/09/2023]
Abstract
The stability of mRNA is one of the major determinants of gene expression. Although a wealth of sequence elements regulating mRNA stability has been described, their quantitative contributions to half-life are unknown. Here, we built a quantitative model for Saccharomyces cerevisiae based on functional mRNA sequence features that explains 59% of the half-life variation between genes and predicts half-life at a median relative error of 30%. The model revealed a new destabilizing 3' UTR motif, ATATTC, which we functionally validated. Codon usage proves to be the major determinant of mRNA stability. Nonetheless, single-nucleotide variations have the largest effect when occurring on 3' UTR motifs or upstream AUGs. Analyzing mRNA half-life data of 34 knockout strains showed that the effect of codon usage not only requires functional decapping and deadenylation, but also the 5'-to-3' exonuclease Xrn1, the nonsense-mediated decay genes, but not no-go decay. Altogether, this study quantitatively delineates the contributions of mRNA sequence features on stability in yeast, reveals their functional dependencies on degradation pathways, and allows accurate prediction of half-life from mRNA sequence.
Collapse
Affiliation(s)
- Jun Cheng
- Department of Informatics, Technical University of Munich, 85748 Garching, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, 81377 München, Germany
| | - Kerstin C Maier
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - Žiga Avsec
- Department of Informatics, Technical University of Munich, 85748 Garching, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, 81377 München, Germany
| | - Petra Rus
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, 85748 Garching, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, 81377 München, Germany
| |
Collapse
|
35
|
Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP, Seelig G, Shendure J, Fowler DM. Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet 2017; 101:315-325. [PMID: 28886340 PMCID: PMC5590843 DOI: 10.1016/j.ajhg.2017.07.014] [Citation(s) in RCA: 246] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Classical genetic approaches for interpreting variants, such as case-control or co-segregation studies, require finding many individuals with each variant. Because the overwhelming majority of variants are present in only a few living humans, this strategy has clear limits. Fully realizing the clinical potential of genetics requires that we accurately infer pathogenicity even for rare or private variation. Many computational approaches to predicting variant effects have been developed, but they can identify only a small fraction of pathogenic variants with the high confidence that is required in the clinic. Experimentally measuring a variant's functional consequences can provide clearer guidance, but individual assays performed only after the discovery of the variant are both time and resource intensive. Here, we discuss how multiplex assays of variant effect (MAVEs) can be used to measure the functional consequences of all possible variants in disease-relevant loci for a variety of molecular and cellular phenotypes. The resulting large-scale functional data can be combined with machine learning and clinical knowledge for the development of "lookup tables" of accurate pathogenicity predictions. A coordinated effort to produce, analyze, and disseminate large-scale functional data generated by multiplex assays could be essential to addressing the variant-interpretation crisis.
Collapse
Affiliation(s)
- Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Bioinformatics & Computational Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Frederick P Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada; Lunenfeld-Tanenbaum Research Institute, Mt. Sinai Hospital, Toronto, ON M5G 1X5, Canada; Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada
| | - Georg Seelig
- Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA; Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
36
|
Predicting synonymous codon usage and optimizing the heterologous gene for expression in E. coli. Sci Rep 2017; 7:9926. [PMID: 28855614 PMCID: PMC5577221 DOI: 10.1038/s41598-017-10546-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 08/11/2017] [Indexed: 11/27/2022] Open
Abstract
Of the 20 common amino acids, 18 are encoded by multiple synonymous codons. These synonymous codons are not redundant; in fact, all of codons contribute substantially to protein expression, structure and function. In this study, the codon usage pattern of genes in the E. coli was learned from the sequenced genomes of E. coli. A machine learning based method, Presyncodon was proposed to predict synonymous codon selection in E. coli based on the learned codon usage patterns of the residue in the context of the specific fragment. The predicting results indicate that Presycoden could be used to predict synonymous codon selection of the gene in the E. coli with the high accuracy. Two reporter genes (egfp and mApple) were designed with a combination of low- and high-frequency-usage codons by the method. The fluorescence intensity of eGFP and mApple expressed by the (egfp and mApple) designed by this method was about 2.3- or 1.7- folds greater than that from the genes with only high-frequency-usage codons in E. coli. Therefore, both low- and high-frequency-usage codons make positive contributions to the functional expression of the heterologous proteins. This method could be used to design synthetic genes for heterologous gene expression in biotechnology.
Collapse
|
37
|
Sewell JA, Fuxman Bass JI. Cellular network perturbations by disease-associated variants. ACTA ACUST UNITED AC 2017; 3:60-66. [PMID: 29057377 DOI: 10.1016/j.coisb.2017.04.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Genetic and genome-wide association studies (GWAS) have identified a myriad of human disease-associated genomic variants. However, these studies do not reveal the mechanisms by which these variants perturb cellular networks, a necessary step to intervene and improve disease outcomes. This has been challenging because multiple variants are present in haplotype blocks, thereby confounding the identification of causal variants, and because most reside in noncoding regions. Here, we review recent advances in the identification of functional variants and gene-variant associations. In addition, we examine approaches used to study perturbations in protein-protein and protein-DNA interactions associated with disease, and discuss how these perturbations affect cellular networks.
Collapse
Affiliation(s)
- Jared A Sewell
- Department of Biology, Boston University, Boston, MA 02215, USA
| | | |
Collapse
|
38
|
Landgraf D, Huh D, Hallacli E, Lindquist S. Scarless Gene Tagging with One-Step Transformation and Two-Step Selection in Saccharomyces cerevisiae and Schizosaccharomyces pombe. PLoS One 2016; 11:e0163950. [PMID: 27736907 PMCID: PMC5063382 DOI: 10.1371/journal.pone.0163950] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 09/16/2016] [Indexed: 11/24/2022] Open
Abstract
Gene tagging with fluorescent proteins is commonly applied to investigate the localization and dynamics of proteins in their cellular environment. Ideally, a fluorescent tag is genetically inserted at the endogenous locus at the N- or C- terminus of the gene of interest without disrupting regulatory sequences including the 5’ and 3’ untranslated region (UTR) and without introducing any extraneous unwanted “scar” sequences, which may create unpredictable transcriptional or translational effects. We present a reliable, low-cost, and highly efficient method for the construction of such scarless C-terminal and N-terminal fusions with fluorescent proteins in yeast. The method relies on sequential positive and negative selection and uses an integration cassette with long flanking regions, which is assembled by two-step PCR, to increase the homologous recombination frequency. The method also enables scarless tagging of essential genes with no need for a complementing plasmid. To further ease high-throughput strain construction, we have computationally automated design of the primers, applied the primer design code to all open reading frames (ORFs) of the budding yeast Saccharomyces cerevisiae (S. cerevisiae) and the fission yeast Schizosaccharomyces pombe (S. pombe), and provide here the computed sequences. To illustrate the scarless N- and C-terminal gene tagging methods in S. cerevisiae, we tagged various genes including the E3 ubiquitin ligase RSP5, the proteasome subunit PRE1, and the eleven Rab GTPases with yeast codon-optimized mNeonGreen or mCherry; several of these represent essential genes. We also implemented the scarless C-terminal gene tagging method in the distantly related organism S. pombe using kanMX6 and HSV1tk as positive and negative selection markers, respectively, as well as ura4. The scarless gene tagging methods presented here are widely applicable to visualize and investigate the functional roles of proteins in living cells.
Collapse
Affiliation(s)
- Dirk Landgraf
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Dann Huh
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Erinc Hallacli
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Susan Lindquist
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
39
|
The power of multiplexed functional analysis of genetic variants. Nat Protoc 2016; 11:1782-7. [PMID: 27583640 DOI: 10.1038/nprot.2016.135] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 07/13/2016] [Indexed: 12/30/2022]
Abstract
New technologies have recently enabled saturation mutagenesis and functional analysis of nearly all possible variants of regulatory elements or proteins of interest in single experiments. Here we discuss the past, present, and future of such multiplexed (functional) assays for variant effects (MAVEs). MAVEs provide detailed insight into sequence-function relationships, and they may prove critical for the prospective clinical interpretation of genetic variants.
Collapse
|
40
|
Lappalainen T. Functional genomics bridges the gap between quantitative genetics and molecular biology. Genome Res 2016; 25:1427-31. [PMID: 26430152 PMCID: PMC4579327 DOI: 10.1101/gr.190983.115] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Deep characterization of molecular function of genetic variants in the human genome is becoming increasingly important for understanding genetic associations to disease and for learning to read the regulatory code of the genome. In this paper, I discuss how recent advances in both quantitative genetics and molecular biology have contributed to understanding functional effects of genetic variants, lessons learned from eQTL studies, and future challenges in this field.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, New York 10013, USA; Department of Systems Biology, Columbia University, New York, New York 10032, USA
| |
Collapse
|
41
|
Peterman N, Levine E. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics 2016; 17:206. [PMID: 26956374 PMCID: PMC4784318 DOI: 10.1186/s12864-016-2533-5] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/25/2016] [Indexed: 12/22/2022] Open
Abstract
Background Sort-seq is an effective approach for simultaneous activity measurements in a large-scale library, combining flow cytometry, deep sequencing, and statistical inference. Such assays enable the characterization of functional landscapes at unprecedented scale for a wide-reaching array of biological molecules and functionalities in vivo. Applications of sort-seq range from footprinting to establishing quantitative models of biological systems and rational design of synthetic genetic elements. Nearly as diverse are implementations of this technique, reflecting key design choices with extensive impact on the scope and accuracy the results. Yet how to make these choices remains unclear. Here we investigate the effects of alternative sort-seq designs and inference methods on the information output using mathematical formulation and simulations. Results We identify key intrinsic properties of any system of interest with practical implications for sort-seq assays, depending on the experimental goals. The fluorescence range and cell-to-cell variability specify the number of sorted populations needed for quantitative measurements that are precise and unbiased. These factors also indicate cases where an enrichment-based approach that uses a single sorted population can offer satisfactory results. These predications of our model are corroborated using re-analysis of published data. We explore implications of these results for quantitative modeling and library design. Conclusions Sort-seq assays can be streamlined by reducing the number of sorted populations, saving considerable resources. Simple preliminary experiments can guide optimal experiment design, minimizing cost while maintaining the maximal information output and avoiding latent biases. These insights can facilitate future applications of this highly adaptable technique. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2533-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Neil Peterman
- Department of Physics and FAS Center for Systems Biology, Harvard University, 17 Oxford St., Cambridge, MA, USA
| | - Erel Levine
- Department of Physics and FAS Center for Systems Biology, Harvard University, 17 Oxford St., Cambridge, MA, USA.
| |
Collapse
|
42
|
Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role of non-coding sequence variants in cancer. Nat Rev Genet 2016; 17:93-108. [PMID: 26781813 DOI: 10.1038/nrg.2015.17] [Citation(s) in RCA: 319] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Patients with cancer carry somatic sequence variants in their tumour in addition to the germline variants in their inherited genome. Although variants in protein-coding regions have received the most attention, numerous studies have noted the importance of non-coding variants in cancer. Moreover, the overwhelming majority of variants, both somatic and germline, occur in non-coding portions of the genome. We review the current understanding of non-coding variants in cancer, including the great diversity of the mutation types--from single nucleotide variants to large genomic rearrangements--and the wide range of mechanisms by which they affect gene expression to promote tumorigenesis, such as disrupting transcription factor-binding sites or functions of non-coding RNAs. We highlight specific case studies of somatic and germline variants, and discuss how non-coding variants can be interpreted on a large-scale through computational and experimental methods.
Collapse
Affiliation(s)
- Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA
| | - Yao Fu
- Bina Technologies, Roche Sequencing, Redwood City, California 94065, USA
| | - Dimple Chakravarty
- Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Francesca Demichelis
- Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10021, USA.,Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Mark A Rubin
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York 10065, USA.,Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.,Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.,Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
43
|
Promoter and Terminator Discovery and Engineering. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 162:21-44. [PMID: 27277391 DOI: 10.1007/10_2016_8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Control of gene expression is crucial to optimize metabolic pathways and synthetic gene networks. Promoters and terminators are stretches of DNA upstream and downstream (respectively) of genes that control both the rate at which the gene is transcribed and the rate at which mRNA is degraded. As a result, both of these elements control net protein expression from a synthetic construct. Thus, it is highly important to discover and engineer promoters and terminators with desired characteristics. This chapter highlights various approaches taken to catalogue these important synthetic elements. Specifically, early strategies have focused largely on semi-rational techniques such as saturation mutagenesis to diversify native promoters and terminators. Next, in an effort to reduce the length of the synthetic biology design cycle, efforts in the field have turned towards the rational design of synthetic promoters and terminators. In this vein, we cover recently developed methods such as hybrid engineering, high throughput characterization, and thermodynamic modeling which allow finer control in the rational design of novel promoters and terminators. Emphasis is placed on the methodologies used and this chapter showcases the utility of these methods across multiple host organisms.
Collapse
|