1
|
Reimão-Pinto MM, Castillo-Hair SM, Seelig G, Schier AF. The regulatory landscape of 5' UTRs in translational control during zebrafish embryogenesis. Dev Cell 2025; 60:1498-1515.e8. [PMID: 39818206 DOI: 10.1016/j.devcel.2024.12.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 07/22/2024] [Accepted: 12/19/2024] [Indexed: 01/18/2025]
Abstract
The 5' UTRs of mRNAs are critical for translation regulation during development, but their in vivo regulatory features are poorly characterized. Here, we report the regulatory landscape of 5' UTRs during early zebrafish embryogenesis using a massively parallel reporter assay of 18,154 sequences coupled to polysome profiling. We found that the 5' UTR suffices to confer temporal dynamics to translation initiation and identified 86 motifs enriched in 5' UTRs with distinct ribosome recruitment capabilities. A quantitative deep learning model, Danio Optimus 5-Prime (DaniO5P), identified a combined role for 5' UTR length, translation initiation site context, upstream AUGs, and sequence motifs on ribosome recruitment. DaniO5P predicts the activities of maternal and zygotic 5' UTR isoforms and indicates that modulating 5' UTR length and motif grammar contributes to translation initiation dynamics. This study provides a first quantitative model of 5' UTR-based translation regulation in development and lays the foundation for identifying the underlying molecular effectors.
Collapse
Affiliation(s)
| | - Sebastian M Castillo-Hair
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA; eScience Institute, University of Washington, Seattle, WA 98195, USA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA; Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA
| | - Alexander F Schier
- Biozentrum, University of Basel, 4056 Basel, Switzerland; Allen Discovery Center for Cell Lineage Tracing, Seattle, WA 98195, USA.
| |
Collapse
|
2
|
Thompson M, Martín M, Olmo TS, Rajesh C, Koo PK, Bolognesi B, Lehner B. Massive experimental quantification allows interpretable deep learning of protein aggregation. SCIENCE ADVANCES 2025; 11:eadt5111. [PMID: 40305601 PMCID: PMC12042874 DOI: 10.1126/sciadv.adt5111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Accepted: 03/26/2025] [Indexed: 05/02/2025]
Abstract
Protein aggregation is a pathological hallmark of more than 50 human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the aggregation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts aggregation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict aggregation.
Collapse
Affiliation(s)
- Mike Thompson
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Mariano Martín
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Trinidad Sanmartín Olmo
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona 08002, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| |
Collapse
|
3
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2025; 26:171-190. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
4
|
Sun P, Wang X, Wang S, Jia X, Feng S, Chen J, Fang Y. Bipolar disorder: Construction and analysis of a joint diagnostic model using random forest and feedforward neural networks. IBRO Neurosci Rep 2024; 17:145-153. [PMID: 39206162 PMCID: PMC11350441 DOI: 10.1016/j.ibneur.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 07/22/2024] [Accepted: 07/30/2024] [Indexed: 09/04/2024] Open
Abstract
Background To construct a diagnostic model for Bipolar Disorder (BD) depressive phase using peripheral tissue RNA data from patients and combining Random Forest with Feedforward Neural Network methods. Methods Datasets GSE23848, GSE39653, and GSE69486 were selected, and differential gene expression analysis was conducted using the limma package in R. Key genes from the differentially expressed genes were identified using the Random Forest method. These key genes' expression levels in each sample were used to train a Feedforward Neural Network model. Techniques like L1 regularization, early stopping, and dropout layers were employed to prevent model overfitting. Model performance was then validated, followed by GO, KEGG, and protein-protein interaction network analyses. Results The final model was a Feedforward Neural Network with two hidden layers and two dropout layers, comprising 2345 trainable parameters. Model performance on the validation set, assessed through 1000 bootstrap resampling iterations, demonstrated a specificity of 0.769 (95 % CI 0.571-1.000), sensitivity of 0.818 (95 % CI 0.533-1.000), AUC value of 0.832 (95 % CI 0.642-0.979), and accuracy of 0.792 (95 % CI 0.625-0.958). Enrichment analysis of key genes indicated no significant enrichment in any known pathways. Conclusion Key genes with biological significance were identified based on the decrease in Gini coefficient within the Random Forest model. The combined use of Random Forest and Feedforward Neural Network to establish a diagnostic model showed good classification performance in Bipolar Disorder.
Collapse
Affiliation(s)
- Ping Sun
- Qingdao Mental Health Center, Shandong 266034, China
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
| | - Xiangwen Wang
- Qingdao Mental Health Center, Shandong 266034, China
- School of Mental Health, Research Institute of Mental Health,Jining Medical University, Shandong 272002, China
| | - Shenghai Wang
- Qingdao Mental Health Center, Shandong 266034, China
| | - Xueyu Jia
- Department of Medicine,Qingdao University, Shandong 266000, China
| | - Shunkang Feng
- Qingdao Mental Health Center, Shandong 266034, China
| | - Jun Chen
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
- Department of Psychiatry & Affective Disorders Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai 201108, China
| | - Yiru Fang
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
- Department of Psychiatry & Affective Disorders Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai 201108, China
- State Key Laboratory of Neuroscience, Shanghai Institue for Biological Sciences, CAS, Shanghai 200031, China
| |
Collapse
|
5
|
Wu J, Xiao Y, Liu Y, Wen L, Jin C, Liu S, Paul S, He C, Regev O, Fei J. Dynamics of RNA localization to nuclear speckles are connected to splicing efficiency. SCIENCE ADVANCES 2024; 10:eadp7727. [PMID: 39413186 PMCID: PMC11482332 DOI: 10.1126/sciadv.adp7727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/11/2024] [Indexed: 10/18/2024]
Abstract
Nuclear speckles are nuclear membraneless organelles in higher eukaryotic cells playing a vital role in gene expression. Using an in situ reverse transcription-based sequencing method, we study nuclear speckle-associated human transcripts. Our data indicate the existence of three gene groups whose transcripts demonstrate different speckle localization properties: stably enriched in nuclear speckles, transiently enriched in speckles at the pre-messenger RNA stage, and not enriched. We find that stably enriched transcripts contain inefficiently excised introns and that disruption of nuclear speckles specifically affects splicing of speckle-enriched transcripts. We further reveal RNA sequence features contributing to transcript speckle localization, indicating a tight interplay between transcript speckle enrichment, genome organization, and splicing efficiency. Collectively, our data highlight a role of nuclear speckles in both co- and posttranscriptional splicing regulation. Last, we show that genes with stably enriched transcripts are over-represented among genes with heat shock-up-regulated intron retention, hinting at a connection between speckle localization and cellular stress response.
Collapse
Affiliation(s)
- Jinjun Wu
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Yu Xiao
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
- Department of Chemistry, The University of Chicago, Chicago, IL 60637, USA
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, 929 East 57th Street, Chicago, IL 60637, USA
| | - Yunzheng Liu
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Li Wen
- Department of Physics, The University of Chicago, Chicago, IL 60637, USA
| | - Chuanyang Jin
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Shun Liu
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
- Department of Chemistry, The University of Chicago, Chicago, IL 60637, USA
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, 929 East 57th Street, Chicago, IL 60637, USA
| | - Sneha Paul
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Chuan He
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
- Department of Chemistry, The University of Chicago, Chicago, IL 60637, USA
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
- Howard Hughes Medical Institute, The University of Chicago, 929 East 57th Street, Chicago, IL 60637, USA
| | - Oded Regev
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Jingyi Fei
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
6
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
7
|
Thompson M, Martín M, Olmo TS, Rajesh C, Koo PK, Bolognesi B, Lehner B. Massive experimental quantification of amyloid nucleation allows interpretable deep learning of protein aggregation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.13.603366. [PMID: 39071305 PMCID: PMC11275847 DOI: 10.1101/2024.07.13.603366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Protein aggregation is a pathological hallmark of more than fifty human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the amyloid nucleation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts amyloid nucleation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict amyloid nucleation.
Collapse
Affiliation(s)
- Mike Thompson
- Systems and Synthetic Biology, Centre for Genomic Regulation, The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
| | - Mariano Martín
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Trinidad Sanmartín Olmo
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Systems and Synthetic Biology, Centre for Genomic Regulation, The Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- University Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
8
|
Li J, Zhou Y, Chen SJ. Embracing exascale computing in nucleic acid simulations. Curr Opin Struct Biol 2024; 87:102847. [PMID: 38815519 PMCID: PMC11283969 DOI: 10.1016/j.sbi.2024.102847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 04/18/2024] [Accepted: 05/09/2024] [Indexed: 06/01/2024]
Abstract
This mini-review reports the recent advances in biomolecular simulations, particularly for nucleic acids, and provides the potential effects of the emerging exascale computing on nucleic acid simulations, emphasizing the need for advanced computational strategies to fully exploit this technological frontier. Specifically, we introduce recent breakthroughs in computer architectures for large-scale biomolecular simulations and review the simulation protocols for nucleic acids regarding force fields, enhanced sampling methods, coarse-grained models, and interactions with ligands. We also explore the integration of machine learning methods into simulations, which promises to significantly enhance the predictive modeling of biomolecules and the analysis of complex data generated by the exascale simulations. Finally, we discuss the challenges and perspectives for biomolecular simulations as we enter the dawning exascale computing era.
Collapse
Affiliation(s)
- Jun Li
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, 223 Physics Bldg., Columbia, 65211, MO, USA
| | - Yuanzhe Zhou
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, 223 Physics Bldg., Columbia, 65211, MO, USA
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, 223 Physics Bldg., Columbia, 65211, MO, USA.
| |
Collapse
|
9
|
Paul S, Arias MA, Wen L, Liao SE, Zhang J, Wang X, Regev O, Fei J. RNA molecules display distinctive organization at nuclear speckles. iScience 2024; 27:109603. [PMID: 38638569 PMCID: PMC11024929 DOI: 10.1016/j.isci.2024.109603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/05/2024] [Accepted: 03/25/2024] [Indexed: 04/20/2024] Open
Abstract
RNA molecules often play critical roles in assisting the formation of membraneless organelles in eukaryotic cells. Yet, little is known about the organization of RNAs within membraneless organelles. Here, using super-resolution imaging and nuclear speckles as a model system, we demonstrate that different sequence domains of RNA transcripts exhibit differential spatial distributions within speckles. Specifically, we image transcripts containing a region enriched in binding motifs of serine/arginine-rich (SR) proteins and another region enriched in binding motifs of heterogeneous nuclear ribonucleoproteins (hnRNPs). We show that these transcripts localize to the outer shell of speckles, with the SR motif-rich region localizing closer to the speckle center relative to the hnRNP motif-rich region. Further, we identify that this intra-speckle RNA organization is driven by the strength of RNA-protein interactions inside and outside speckles. Our results hint at novel functional roles of nuclear speckles and likely other membraneless organelles in organizing RNA substrates for biochemical reactions.
Collapse
Affiliation(s)
- Sneha Paul
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Mauricio A. Arias
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
- Institute for System Genetics, NYU Langone Health, New York, NY 10016, USA
| | - Li Wen
- Department of Physics, The University of Chicago, Chicago, IL 60637, USA
| | - Susan E. Liao
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Jiacheng Zhang
- Graduate Program in Biophysical Sciences, The University of Chicago, Chicago, IL 60637, USA
| | - Xiaoshu Wang
- The College, The University of Chicago, Chicago, IL 60637, USA
| | - Oded Regev
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Jingyi Fei
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
- Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|