1
|
Riley AT, Robson JM, Ulanova A, Green AA. Generative and predictive neural networks for the design of functional RNA molecules. Nat Commun 2025; 16:4155. [PMID: 40320400 PMCID: PMC12050331 DOI: 10.1038/s41467-025-59389-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 04/16/2025] [Indexed: 05/08/2025] Open
Abstract
RNA is a remarkably versatile molecule that has been engineered for applications in therapeutics, diagnostics, and in vivo information-processing systems. However, the complex relationship between the sequence, structure, and function of RNA often necessitates extensive experimental screening of candidate sequences. Here we present a generalized, efficient neural network architecture that utilizes the sequence and structure of RNA molecules (SANDSTORM) to inform functional predictions across a diverse range of settings. We pair these predictive models with generative adversarial RNA design networks (GARDN), allowing the generative modelling of a diverse range of functional RNA molecules with targeted experimental attributes. This approach enables the design of novel sequence candidates that outperform those encountered during training or returned by classical thermodynamic algorithms, and can be deployed using as few as 384 example sequences. SANDSTORM and GARDN thus represent powerful new predictive and generative tools for the development of RNA molecules with improved function.
Collapse
Affiliation(s)
- Aidan T Riley
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - James M Robson
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Aiganysh Ulanova
- College of Arts and Sciences, Biochemistry and Molecular Biology Program Boston University, Boston, MA, USA
- Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA
| | - Alexander A Green
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Molecular Biology, Cell Biology & Biochemistry Program, Graduate School of Arts and Sciences, Boston University, Boston, MA, USA.
| |
Collapse
|
2
|
Jin L, Zhou Y, Zhang S, Chen SJ. mRNA vaccine sequence and structure design and optimization: Advances and challenges. J Biol Chem 2025; 301:108015. [PMID: 39608721 PMCID: PMC11728972 DOI: 10.1016/j.jbc.2024.108015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/13/2024] [Accepted: 11/16/2024] [Indexed: 11/30/2024] Open
Abstract
Messenger RNA (mRNA) vaccines have emerged as a powerful tool against communicable diseases and cancers, as demonstrated by their huge success during the coronavirus disease 2019 (COVID-19) pandemic. Despite the outstanding achievements, mRNA vaccines still face challenges such as stringent storage requirements, insufficient antigen expression, and unexpected immune responses. Since the intrinsic properties of mRNA molecules significantly impact vaccine performance, optimizing mRNA design is crucial in preclinical development. In this review, we outline four key principles for optimal mRNA sequence design: enhancing ribosome loading and translation efficiency through untranslated region (UTR) optimization, improving translation efficiency via codon optimization, increasing structural stability by refining global RNA sequence and extending in-cell lifetime and expression fidelity by adjusting local RNA structures. We also explore recent advancements in computational models for designing and optimizing mRNA vaccine sequences following these principles. By integrating current mRNA knowledge, addressing challenges, and examining advanced computational methods, this review aims to promote the application of computational approaches in mRNA vaccine development and inspire novel solutions to existing obstacles.
Collapse
Affiliation(s)
- Lei Jin
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Sicheng Zhang
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA; Department of Biochemistry, MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA.
| |
Collapse
|
3
|
Nahali S, Safari L, Khanteymoori A, Huang J. StructmRNA a BERT based model with dual level and conditional masking for mRNA representation. Sci Rep 2024; 14:26043. [PMID: 39472486 PMCID: PMC11522565 DOI: 10.1038/s41598-024-77172-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 10/21/2024] [Indexed: 11/02/2024] Open
Abstract
In this study, we introduce StructmRNA, a new BERT-based model that was designed for the detailed analysis of mRNA sequences and structures. The success of DNABERT in understanding the intricate language of non-coding DNA with bidirectional encoder representations is extended to mRNA with StructmRNA. This new model uses a special dual-level masking technique that covers both sequence and structure, along with conditional masking. This enables StructmRNA to adeptly generate meaningful embeddings for mRNA sequences, even in the absence of explicit structural data, by capitalizing on the intricate sequence-structure correlations learned during extensive pre-training on vast datasets. Compared to well-known models like those in the Stanford OpenVaccine project, StructmRNA performs better in important tasks such as predicting RNA degradation. Thus, StructmRNA can inform better RNA-based treatments by predicting the secondary structures and biological functions of unseen mRNA sequences. The proficiency of this model is further confirmed by rigorous evaluations, revealing its unprecedented ability to generalize across various organisms and conditions, thereby marking a significant advance in the predictive analysis of mRNA for therapeutic design. With this work, we aim to set a new standard for mRNA analysis, contributing to the broader field of genomics and therapeutic development.
Collapse
Affiliation(s)
- Sepideh Nahali
- Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada.
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran.
| | - Leila Safari
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
| | | | - Jimmy Huang
- Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada
| |
Collapse
|
4
|
Lokras AG, Bobak TR, Baghel SS, Sebastiani F, Foged C. Advances in the design and delivery of RNA vaccines for infectious diseases. Adv Drug Deliv Rev 2024; 213:115419. [PMID: 39111358 DOI: 10.1016/j.addr.2024.115419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 07/19/2024] [Accepted: 07/30/2024] [Indexed: 08/23/2024]
Abstract
RNA medicines represent a paradigm shift in treatment and prevention of critical diseases of global significance, e.g., infectious diseases. The highly successful messenger RNA (mRNA) vaccines against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were developed at record speed during the coronavirus disease 2019 pandemic. A consequence of this is exceptionally shortened vaccine development times, which in combination with adaptability makes the RNA vaccine technology highly attractive against infectious diseases and for pandemic preparedness. Here, we review state of the art in the design and delivery of RNA vaccines for infectious diseases based on different RNA modalities, including linear mRNA, self-amplifying RNA, trans-amplifying RNA, and circular RNA. We provide an overview of the clinical pipeline of RNA vaccines for infectious diseases, and present analytical procedures, which are paramount for characterizing quality attributes and guaranteeing their quality, and we discuss future perspectives for using RNA vaccines to combat pathogens beyond SARS-CoV-2.
Collapse
Affiliation(s)
- Abhijeet Girish Lokras
- Department of Pharmacy, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen Ø, Denmark
| | - Thomas Rønnemoes Bobak
- Department of Pharmacy, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen Ø, Denmark
| | - Saahil Sandeep Baghel
- Department of Pharmacy, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen Ø, Denmark
| | - Federica Sebastiani
- Department of Pharmacy, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen Ø, Denmark; Division of Physical Chemistry, Department of Chemistry, Lund University, 22100, Lund, Sweden
| | - Camilla Foged
- Department of Pharmacy, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen Ø, Denmark.
| |
Collapse
|
5
|
Arulsamy K, Xia B, Chen H, Zhang L, Chen K. Machine Learning Uncovers Vascular Endothelial Cell Identity Genes by Expression Regulation Features in Single Cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.27.609808. [PMID: 39253493 PMCID: PMC11383289 DOI: 10.1101/2024.08.27.609808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Deciphering cell identity genes is pivotal to understanding cell differentiation, development, and many diseases involving cell identity dysregulation. Here, we introduce SCIG, a machine-learning method to uncover cell identity genes in single cells. In alignment with recent reports that cell identity genes are regulated with unique epigenetic signatures, we found cell identity genes exhibit distinctive genetic sequence signatures, e.g., unique enrichment patterns of cis-regulatory elements. Using these genetic sequence signatures, along with gene expression information from single-cell RNA-seq data, enables SCIG to uncover the identity genes of a cell without a need for comparison to other cells. Cell identity gene score defined by SCIG surpassed expression value in network analysis to uncover master transcription factors regulating cell identity. Applying SCIG to the human endothelial cell atlas revealed that the tissue microenvironment is a critical supplement to master transcription factors for cell identity refinement. SCIG is publicly available at https://github.com/kaifuchenlab/SCIG , offering a valuable tool for advancing cell differentiation, development, and regenerative medicine research.
Collapse
|
6
|
Li S, Moayedpour S, Li R, Bailey M, Riahi S, Kogler-Anele L, Miladi M, Miner J, Pertuy F, Zheng D, Wang J, Balsubramani A, Tran K, Zacharia M, Wu M, Gu X, Clinton R, Asquith C, Skaleski J, Boeglin L, Chivukula S, Dias A, Strugnell T, Montoya FU, Agarwal V, Bar-Joseph Z, Jager S. CodonBERT large language model for mRNA vaccines. Genome Res 2024; 34:1027-1035. [PMID: 38951026 PMCID: PMC11368176 DOI: 10.1101/gr.278870.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 06/25/2024] [Indexed: 07/03/2024]
Abstract
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.
Collapse
Affiliation(s)
- Sizhen Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Ruijiang Li
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Michael Bailey
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | - Saleh Riahi
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| | | | - Milad Miladi
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jacob Miner
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Fabien Pertuy
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Dinghai Zheng
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Jun Wang
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Khang Tran
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Minnie Zacharia
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Monica Wu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Xiaobo Gu
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ryan Clinton
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Carla Asquith
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Joseph Skaleski
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Lianne Boeglin
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Sudha Chivukula
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Anusha Dias
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Tod Strugnell
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | | | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi, Waltham, Massachusetts 02451, USA
| | - Ziv Bar-Joseph
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA;
| | - Sven Jager
- Digital R&D, Sanofi, Cambridge, Massachusetts 02141, USA
| |
Collapse
|
7
|
He S, Huang R, Townley J, Kretsch RC, Karagianes TG, Cox DBT, Blair H, Penzar D, Vyaltsev V, Aristova E, Zinkevich A, Bakulin A, Sohn H, Krstevski D, Fukui T, Tatematsu F, Uchida Y, Jang D, Lee JS, Shieh R, Ma T, Martynov E, Shugaev MV, Bukhari HST, Fujikawa K, Onodera K, Henkel C, Ron S, Romano J, Nicol JJ, Nye GP, Wu Y, Choe C, Reade W, Das R. Ribonanza: deep learning of RNA structure through dual crowdsourcing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581671. [PMID: 38464325 PMCID: PMC10925082 DOI: 10.1101/2024.02.24.581671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
Collapse
Affiliation(s)
- Shujun He
- Department of Chemical Engineering, Texas A&M University, TX, USA
| | - Rui Huang
- Department of Biochemistry, Stanford CA, USA
| | | | | | | | - David B T Cox
- Department of Biochemistry, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
| | | | - Dmitry Penzar
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Valeriy Vyaltsev
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Elizaveta Aristova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Arsenii Zinkevich
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Artemy Bakulin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
| | - Hoyeol Sohn
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Daniel Krstevski
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | | | | | | | - Donghoon Jang
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
| | | | - Roger Shieh
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Tom Ma
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Eduard Martynov
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
| | - Maxim V Shugaev
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
| | | | | | | | | | - Shlomo Ron
- Department of Chemical Engineering, Texas A&M University, TX, USA
- Department of Biochemistry, Stanford CA, USA
- Eterna Massive Open Laboratory
- Biophysics Program, Stanford CA, USA
- Department of Medicine, Division of Hematology, and Department of Biochemistry, Stanford CA, USA
- Department of Mathematics, Stanford CA, USA
- AIRI, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow 119991, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow 117997, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russian Federation
- GO Inc., Tokyo, Japan
- Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of Korea
- DeltaX, Seoul, Republic of Korea
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russian Federation
- Department of Materials Science and Engineering, University of Virginia, Charlottesville, VA 22904-4745, USA
- Vergesense, CA
- DeNA, Tokyo, Japan
- NVIDIA, Tokyo, Japan
- NVIDIA, Munich
- Howard Hughes Medical Institute
- Department of Bioengineering, Stanford CA, USA
- Kaggle, San Francisco CA, USA
| | - Jonathan Romano
- Eterna Massive Open Laboratory
- Howard Hughes Medical Institute
| | | | - Grace P Nye
- Department of Biochemistry, Stanford CA, USA
| | - Yuan Wu
- Department of Biochemistry, Stanford CA, USA
- Howard Hughes Medical Institute
| | | | | | - Rhiju Das
- Department of Biochemistry, Stanford CA, USA
- Biophysics Program, Stanford CA, USA
- Howard Hughes Medical Institute
| |
Collapse
|
8
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
9
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
10
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
11
|
Zhang H, Vandesompele J, Braeckmans K, De Smedt SC, Remaut K. Nucleic acid degradation as barrier to gene delivery: a guide to understand and overcome nuclease activity. Chem Soc Rev 2024; 53:317-360. [PMID: 38073448 DOI: 10.1039/d3cs00194f] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Gene therapy is on its way to revolutionize the treatment of both inherited and acquired diseases, by transferring nucleic acids to correct a disease-causing gene in the target cells of patients. In the fight against infectious diseases, mRNA-based therapeutics have proven to be a viable strategy in the recent Covid-19 pandemic. Although a growing number of gene therapies have been approved, the success rate is limited when compared to the large number of preclinical and clinical trials that have been/are being performed. In this review, we highlight some of the hurdles which gene therapies encounter after administration into the human body, with a focus on nucleic acid degradation by nucleases that are extremely abundant in mammalian organs, biological fluids as well as in subcellular compartments. We overview the available strategies to reduce the biodegradation of gene therapeutics after administration, including chemical modifications of the nucleic acids, encapsulation into vectors and co-administration with nuclease inhibitors and discuss which strategies are applied for clinically approved nucleic acid therapeutics. In the final part, we discuss the currently available methods and techniques to qualify and quantify the integrity of nucleic acids, with their own strengths and limitations.
Collapse
Affiliation(s)
- Heyang Zhang
- Laboratory for General Biochemistry and Physical Pharmacy, Department of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium.
- Leiden Academic Centre for Drug Research, Leiden University, 2333 CC Leiden, The Netherlands
| | - Jo Vandesompele
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Kevin Braeckmans
- Laboratory for General Biochemistry and Physical Pharmacy, Department of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium.
- Centre for Nano- and Biophotonics, Ghent University, 9000 Ghent, Belgium
| | - Stefaan C De Smedt
- Laboratory for General Biochemistry and Physical Pharmacy, Department of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium.
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Centre for Nano- and Biophotonics, Ghent University, 9000 Ghent, Belgium
| | - Katrien Remaut
- Laboratory for General Biochemistry and Physical Pharmacy, Department of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium.
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| |
Collapse
|
12
|
Yoshinaga M, Takeuchi O. RNA Metabolism Governs Immune Function and Response. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1444:145-161. [PMID: 38467978 DOI: 10.1007/978-981-99-9781-7_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Inflammation is a complex process that protects our body from various insults such as infection, injury, and stress. Proper inflammation is beneficial to eliminate the insults and maintain organ homeostasis, however, it can become detrimental if uncontrolled. To tightly regulate inflammation, post-transcriptional mechanisms governing RNA metabolism play a crucial role in monitoring the expression of immune-related genes, such as tumor necrosis factor (TNF) and interleukin-6 (IL-6). These mechanisms involve the coordinated action of various RNA-binding proteins (RBPs), including the Regnase family, Roquin, and RNA methyltransferases, which are responsible for mRNA decay and/or translation regulation. The collaborative efforts of these RBPs are essential in preventing aberrant immune response activation and consequently safeguarding against inflammatory and autoimmune diseases. This review provides an overview of recent advancements in our understanding of post-transcriptional regulation within the immune system and explores the specific roles of individual RBPs in RNA metabolism and regulation.
Collapse
Affiliation(s)
- Masanori Yoshinaga
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Osamu Takeuchi
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
13
|
Ye Z, Harmon J, Ni W, Li Y, Wich D, Xu Q. The mRNA Vaccine Revolution: COVID-19 Has Launched the Future of Vaccinology. ACS NANO 2023; 17:15231-15253. [PMID: 37535899 DOI: 10.1021/acsnano.2c12584] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
During the COVID-19 pandemic, mRNA (mRNA) vaccines emerged as leading vaccine candidates in a record time. Nonreplicating mRNA (NRM) and self-amplifying mRNA (SAM) technologies have been developed into high-performing and clinically viable vaccines against a range of infectious agents, notably SARS-CoV-2. mRNA vaccines demonstrate efficient in vivo delivery, long-lasting stability, and nonexistent risk of infection. The stability and translational efficiency of in vitro transcription (IVT)-mRNA can be further increased by modulating its structural elements. In this review, we present a comprehensive overview of the recent advances, key applications, and future challenges in the field of mRNA-based vaccinology.
Collapse
Affiliation(s)
- Zhongfeng Ye
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Joseph Harmon
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Wei Ni
- Department of Medical Oncology, Dana-Farber Cancer Institute at Harvard Medical School, Boston, Massachusetts 02215, United States
| | - Yamin Li
- Department of Pharmacology, State University of New York Upstate Medical University, Syracuse, New York 13210, United States
| | - Douglas Wich
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Qiaobing Xu
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| |
Collapse
|
14
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
15
|
Riley AT, Robson JM, Green AA. Generative and predictive neural networks for the design of functional RNA molecules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549043. [PMID: 37503279 PMCID: PMC10370010 DOI: 10.1101/2023.07.14.549043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
RNA is a remarkably versatile molecule that has been engineered for applications in therapeutics, diagnostics, and in vivo information-processing systems. However, the complex relationship between the sequence and structural properties of an RNA molecule and its ability to perform specific functions often necessitates extensive experimental screening of candidate sequences. Here we present a generalized neural network architecture that utilizes the sequence and structure of RNA molecules (SANDSTORM) to inform functional predictions. We demonstrate that this approach achieves state-of-the-art performance across several distinct RNA prediction tasks, while learning interpretable abstractions of RNA secondary structure. We paired these predictive models with generative adversarial RNA design networks (GARDN), allowing the generative modelling of novel mRNA 5' untranslated regions and toehold switch riboregulators exhibiting a predetermined fitness. This approach enabled the design of novel toehold switches with a 43-fold increase in experimentally characterized dynamic range compared to those designed using classic thermodynamic algorithms. SANDSTORM and GARDN thus represent powerful new predictive and generative tools for the development of diagnostic and therapeutic RNA molecules with improved function.
Collapse
Affiliation(s)
- Aidan T. Riley
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - James M. Robson
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - Alexander A. Green
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
- Molecular Biology, Cell Biology & Biochemistry Program, Graduate School of Arts and Sciences, Boston University, Boston, MA 02215, USA
| |
Collapse
|
16
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
17
|
Crowdsourcing to predict RNA degradation and secondary structure. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00615-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|