1
|
Struhl K. Yeast molecular genetic tricks to study gene regulation. Genetics 2025; 230:iyaf041. [PMID: 40152592 DOI: 10.1093/genetics/iyaf041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2025] Open
Abstract
The Genetics Society of America's (GSA) Edward Novitski Prize is awarded to researchers for extraordinary creativity and intellectual ingenuity in genetics research. Struhl is being recognized for his pioneering work cloning a functional eukaryotic gene in E. coli, defining its promoter and regulatory region, and using random DNA and amino acid sequences to define determinants of specificity. The award also recognizes other key scientific contributions including Struhl's discovery of the sequences and protein interactions required for transcriptional activation and repression and demonstrating the importance of nucleosome-free regions for transcription initiation, among others.
Collapse
Affiliation(s)
- Kevin Struhl
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
2
|
Meneu L, Chapard C, Serizay J, Westbrook A, Routhier E, Ruault M, Perrot M, Minakakis A, Girard F, Bignaud A, Even A, Gourgues G, Libri D, Lartigue C, Piazza A, Thierry A, Taddei A, Beckouët F, Mozziconacci J, Koszul R. Sequence-dependent activity and compartmentalization of foreign DNA in a eukaryotic nucleus. Science 2025; 387:eadm9466. [PMID: 39913590 DOI: 10.1126/science.adm9466] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 09/26/2024] [Accepted: 11/21/2024] [Indexed: 04/23/2025]
Abstract
In eukaryotes, DNA-associated protein complexes coevolve with genomic sequences to orchestrate chromatin folding. We investigate the relationship between DNA sequence and the spontaneous loading and activity of chromatin components in the absence of coevolution. Using bacterial genomes integrated into Saccharomyces cerevisiae, which diverged from yeast more than 2 billion years ago, we show that nucleosomes, cohesins, and associated transcriptional machinery can lead to the formation of two different chromatin archetypes, one transcribed and the other silent, independently of heterochromatin formation. These two archetypes also form on eukaryotic exogenous sequences, depend on sequence composition, and can be predicted using neural networks trained on the native genome. They do not mix in the nucleus, leading to a bipartite nuclear compartmentalization, reminiscent of the organization of vertebrate nuclei.
Collapse
Affiliation(s)
- Léa Meneu
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
- Sorbonne Université, College Doctoral
| | - Christophe Chapard
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| | - Jacques Serizay
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| | - Alex Westbrook
- Sorbonne Université, College Doctoral
- Laboratoire Structure et Instabilité des génomes, UMR 7196, Muséum National d'Histoire Naturelle, Paris, France
| | - Etienne Routhier
- Sorbonne Université, College Doctoral
- Laboratoire Structure et Instabilité des génomes, UMR 7196, Muséum National d'Histoire Naturelle, Paris, France
- Laboratoire de Physique Théorique de la Matière Condensée, Sorbonne Université, CNRS, Paris, France
| | - Myriam Ruault
- Institut Curie, PSL University, Sorbonne Université, CNRS UMR 3664 Nuclear Dynamics, Paris, France
| | - Manon Perrot
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
- Sorbonne Université, College Doctoral
| | - Alexandros Minakakis
- Institut de Génétique Moléculaire de Montpellier, Univ Montpellier, CNRS, Montpellier, France
| | - Fabien Girard
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| | - Amaury Bignaud
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
- Sorbonne Université, College Doctoral
| | - Antoine Even
- Institut Curie, PSL University, Sorbonne Université, CNRS UMR 3664 Nuclear Dynamics, Paris, France
| | - Géraldine Gourgues
- Univ. Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Villenave d'Ornon, France
| | - Domenico Libri
- Institut de Génétique Moléculaire de Montpellier, Univ Montpellier, CNRS, Montpellier, France
| | - Carole Lartigue
- Univ. Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, Villenave d'Ornon, France
| | - Aurèle Piazza
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| | - Agnès Thierry
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| | - Angela Taddei
- Institut Curie, PSL University, Sorbonne Université, CNRS UMR 3664 Nuclear Dynamics, Paris, France
| | - Frédéric Beckouët
- Molecular, Cellular and Developmental biology unit (MCD), Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julien Mozziconacci
- Laboratoire Structure et Instabilité des génomes, UMR 7196, Muséum National d'Histoire Naturelle, Paris, France
- Laboratoire de Physique Théorique de la Matière Condensée, Sorbonne Université, CNRS, Paris, France
- UAR 2700 2AD, Muséum National d'Histoire Naturelle, Paris, France
| | - Romain Koszul
- Institut Pasteur, CNRS UMR 3525, Université Paris Cité, Unité Régulation Spatiale des Génomes, Paris, France
| |
Collapse
|
3
|
Mick S, Carroll C, Uriostegui-Arcos M, Fiszbein A. Hybrid exons evolved by coupling transcription initiation and splicing at the nucleotide level. Nucleic Acids Res 2025; 53:gkae1251. [PMID: 39739742 PMCID: PMC11797052 DOI: 10.1093/nar/gkae1251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/27/2024] [Accepted: 12/05/2024] [Indexed: 01/02/2025] Open
Abstract
Exons within transcripts are traditionally classified as first, internal or last exons, each governed by different regulatory mechanisms. We recently described the widespread usage of 'hybrid' exons that serve as terminal or internal exons in different transcripts. Here, we employ an interpretable deep learning pipeline to dissect the sequence features governing the co-regulation of transcription initiation and splicing in hybrid exons. Using ENCODE data from human tissues, we identified 80 000 hybrid first-internal exons. These exons often possess a relaxed chromatin state, allowing transcription initiation within the gene body. Interestingly, transcription start sites of hybrid exons are typically centered at the 3' splice site, suggesting tight coupling between splicing and transcription initiation. We identified two subcategories of hybrid exons: the majority resemble internal exons, maintaining strong 3' splice sites, while a minority show enrichment in promoter elements, resembling first exons. Diving into the evolution of their sequences, we found that human hybrid exons with orthologous first exons in other species usually gained 3' splice sites or whole exons upstream, while those with orthologous internal exons often gained promoter elements. Overall, our findings unveil the intricate regulatory landscape of hybrid exons and reveal stronger connections between transcription initiation and RNA splicing than previously acknowledged.
Collapse
Affiliation(s)
- Steven T Mick
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | - Christine L Carroll
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
| | | | - Ana Fiszbein
- Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA
- Computing & Data Sciences, Boston University, 665 Commonwealth Ave., Boston, 02215, USA
| |
Collapse
|
4
|
Geisberg JV, Moqtaderi Z, Struhl K. Location of polyadenylation sites within 3' untranslated regions is linked to biological function in yeast. Genetics 2024; 228:iyae163. [PMID: 39383179 PMCID: PMC11631516 DOI: 10.1093/genetics/iyae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 10/07/2024] [Indexed: 10/11/2024] Open
Abstract
Expression of a typical yeast gene results in ∼50 3' mRNA isoforms that are distinguished by the locations of poly(A) sites within the 3' untranslated regions (3' UTRs). The location of poly(A) sites with respect to the translational termination codon varies considerably among genes, but whether this has any functional significance is poorly understood. Using hierarchical clustering of 3' UTRs, we identify eight classes of S. cerevisiae genes based on their poly(A) site locations. Genes involved in related biological functions (GO categories) are uniquely over-represented in six of these classes. Similar analysis of S. pombe genes reveals three classes of 3' UTRs, all of which show over-representation of functionally related genes. Remarkably, S. cerevisiae and S. pombe homologs share related patterns of poly(A) site locations. These observations suggest that the location of poly(A) sites within 3' UTRs has biological significance.
Collapse
Affiliation(s)
- Joseph V Geisberg
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Zarmik Moqtaderi
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Kevin Struhl
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
5
|
Gvozdenov Z, Peng AYT, Biswas A, Barcutean Z, Gestaut D, Frydman J, Struhl K, Freeman BC. TRiC/CCT Chaperonin Governs RNA Polymerase II Activity in the Nucleus to Support RNA Homeostasis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.26.615188. [PMID: 39386699 PMCID: PMC11463447 DOI: 10.1101/2024.09.26.615188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
The chaperonin TRiC/CCT is a large hetero-oligomeric ringed-structure that is essential in eukaryotes. While present in the nucleus, TRiC/CCT is typically considered to function in the cytosol where it mediates nascent polypeptide folding and the assembly/disassembly of protein complexes. Here, we investigated the nuclear role of TRiC/CCT. Inactivation of TRiC/CCT resulted in a significant increase in the production of nascent RNA leading to the accumulation of noncoding transcripts. The influence on transcription was not due to cytoplasmic TRiC/CCT-activities or other nuclear proteins as the effect was observed when TRiC/CCT was evicted from the nucleus and restricted to the cytoplasm. Rather, our data support a direct role of TRiC/CCT in regulating RNA polymerase II activity, as the chaperonin modulated nascent RNA production both in vivo and in vitro. Overall, our studies reveal a new avenue by which TRiC/CCT contributes to cell homeostasis by regulating the activity of nuclear RNA polymerase II.
Collapse
|
6
|
|
7
|
Camellato BR, Brosh R, Ashe HJ, Maurano MT, Boeke JD. Synthetic reversed sequences reveal default genomic states. Nature 2024; 628:373-380. [PMID: 38448583 PMCID: PMC11006607 DOI: 10.1038/s41586-024-07128-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.
Collapse
Affiliation(s)
| | - Ran Brosh
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Hannah J Ashe
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Matthew T Maurano
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
- Department of Pathology, NYU Langone Health, New York, NY, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA.
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.
- Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY, USA.
| |
Collapse
|
8
|
Struhl K. How is polyadenylation restricted to 3'-untranslated regions? Yeast 2024; 41:186-191. [PMID: 38041485 PMCID: PMC11001523 DOI: 10.1002/yea.3915] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/30/2023] [Accepted: 11/21/2023] [Indexed: 12/03/2023] Open
Abstract
Polyadenylation occurs at numerous sites within 3'-untranslated regions (3'-UTRs) but rarely within coding regions. How does Pol II travel through long coding regions without generating poly(A) sites, yet then permits promiscuous polyadenylation once it reaches the 3'-UTR? The cleavage/polyadenylation (CpA) machinery preferentially associates with 3'-UTRs, but it is unknown how its recruitment is restricted to 3'-UTRs during Pol II elongation. Unlike coding regions, 3'-UTRs have long AT-rich stretches of DNA that may be important for restricting polyadenylation to 3'-UTRs. Recognition of the 3'-UTR could occur at the DNA (AT-rich), RNA (AU-rich), or RNA:DNA hybrid (rU:dA- and/or rA:dT-rich) level. Based on the nucleic acid critical for 3'-UTR recognition, there are three classes of models, not mutually exclusive, for how the CpA machinery is selectively recruited to 3'-UTRs, thereby restricting where polyadenylation occurs: (1) RNA-based models suggest that the CpA complex directly (or indirectly through one or more intermediary proteins) binds long AU-rich stretches that are exposed after Pol II passes through these regions. (2) DNA-based models suggest that the AT-rich sequence affects nucleosome depletion or the elongating Pol II machinery, resulting in dissociation of some elongation factors and subsequent recruitment of the CpA machinery. (3) RNA:DNA hybrid models suggest that preferential destabilization of the Pol II elongation complex at rU:dA- and/or rA:dT-rich duplexes bridging the nucleotide addition and RNA exit sites permits preferential association of the CpA machinery with 3'-UTRs. Experiments to provide evidence for one or more of these models are suggested.
Collapse
Affiliation(s)
- Kevin Struhl
- Dept. Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
9
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
10
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|