1
|
Imani S, Li X, Chen K, Maghsoudloo M, Jabbarzadeh Kaboli P, Hashemi M, Khoushab S, Li X. Computational biology and artificial intelligence in mRNA vaccine design for cancer immunotherapy. Front Cell Infect Microbiol 2025; 14:1501010. [PMID: 39902185 PMCID: PMC11788159 DOI: 10.3389/fcimb.2024.1501010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 12/16/2024] [Indexed: 02/05/2025] Open
Abstract
Messenger RNA (mRNA) vaccines offer an adaptable and scalable platform for cancer immunotherapy, requiring optimal design to elicit a robust and targeted immune response. Recent advancements in bioinformatics and artificial intelligence (AI) have significantly enhanced the design, prediction, and optimization of mRNA vaccines. This paper reviews technologies that streamline mRNA vaccine development, from genomic sequencing to lipid nanoparticle (LNP) formulation. We discuss how accurate predictions of neoantigen structures guide the design of mRNA sequences that effectively target immune and cancer cells. Furthermore, we examine AI-driven approaches that optimize mRNA-LNP formulations, enhancing delivery and stability. These technological innovations not only improve vaccine design but also enhance pharmacokinetics and pharmacodynamics, offering promising avenues for personalized cancer immunotherapy.
Collapse
Affiliation(s)
- Saber Imani
- Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang, China
| | - Xiaoyan Li
- Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang, China
| | - Keyi Chen
- Key Laboratory of Artificial Organs and Computational Medicine in Zhejiang Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang, China
| | - Mazaher Maghsoudloo
- Key Laboratory of Epigenetics and Oncology, the Research Center for Preclinical Medicine, Southwest Medical University, Luzhou, Sichuan, China
| | | | - Mehrdad Hashemi
- Department of Genetics, Faculty of Advanced Science and Technology, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
- Farhikhtegan Medical Convergence sciences Research Center, Farhikhtegan Hospital Tehran Medical sciences, Islamic Azad University, Tehran, Iran
| | - Saloomeh Khoushab
- Department of Genetics, Faculty of Advanced Science and Technology, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
- Farhikhtegan Medical Convergence sciences Research Center, Farhikhtegan Hospital Tehran Medical sciences, Islamic Azad University, Tehran, Iran
| | - Xiaoping Li
- Key Laboratory of Artificial Organs and Computational Medicine in Zhejiang Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou, Zhejiang, China
| |
Collapse
|
2
|
Taubert O, von der Lehr F, Bazarova A, Faber C, Knechtges P, Weiel M, Debus C, Coquelin D, Basermann A, Streit A, Kesselheim S, Götz M, Schug A. RNA contact prediction by data efficient deep learning. Commun Biol 2023; 6:913. [PMID: 37674020 PMCID: PMC10482910 DOI: 10.1038/s42003-023-05244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023] Open
Abstract
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps") as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction.
Collapse
Affiliation(s)
- Oskar Taubert
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Fabrice von der Lehr
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Alina Bazarova
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Christian Faber
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
| | - Philipp Knechtges
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Charlotte Debus
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Daniel Coquelin
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Achim Basermann
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Achim Streit
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Markus Götz
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
- Helmholtz AI, 81675, Munich, Germany.
| | - Alexander Schug
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany.
- Faculty of Biology, University of Duisburg-Essen, 45117, Essen, Germany.
| |
Collapse
|
3
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
4
|
Li J, Chen SJ. RNAJP: enhanced RNA 3D structure predictions with non-canonical interactions and global topology sampling. Nucleic Acids Res 2023; 51:3341-3356. [PMID: 36864729 PMCID: PMC10123122 DOI: 10.1093/nar/gkad122] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 01/14/2023] [Accepted: 02/25/2023] [Indexed: 03/04/2023] Open
Abstract
RNA 3D structures are critical for understanding their functions. However, only a limited number of RNA structures have been experimentally solved, so computational prediction methods are highly desirable. Nevertheless, accurate prediction of RNA 3D structures, especially those containing multiway junctions, remains a significant challenge, mainly due to the complicated non-canonical base pairing and stacking interactions in the junction loops and the possible long-range interactions between loop structures. Here we present RNAJP ('RNA Junction Prediction'), a nucleotide- and helix-level coarse-grained model for the prediction of RNA 3D structures, particularly junction structures, from a given 2D structure. Through global sampling of the 3D arrangements of the helices in junctions using molecular dynamics simulations and in explicit consideration of non-canonical base pairing and base stacking interactions as well as long-range loop-loop interactions, the model can provide significantly improved predictions for multibranched junction structures than existing methods. Moreover, integrated with additional restraints from experiments, such as junction topology and long-range interactions, the model may serve as a useful structure generator for various applications.
Collapse
Affiliation(s)
- Jun Li
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
5
|
Roberts JM, Beck JD, Pollock TB, Bendixsen DP, Hayden EJ. RNA sequence to structure analysis from comprehensive pairwise mutagenesis of multiple self-cleaving ribozymes. eLife 2023; 12:80360. [PMID: 36655987 PMCID: PMC9901934 DOI: 10.7554/elife.80360] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Self-cleaving ribozymes are RNA molecules that catalyze the cleavage of their own phosphodiester backbones. These ribozymes are found in all domains of life and are also a tool for biotechnical and synthetic biology applications. Self-cleaving ribozymes are also an important model of sequence-to-function relationships for RNA because their small size simplifies synthesis of genetic variants and self-cleaving activity is an accessible readout of the functional consequence of the mutation. Here, we used a high-throughput experimental approach to determine the relative activity for every possible single and double mutant of five self-cleaving ribozymes. From this data, we comprehensively identified non-additive effects between pairs of mutations (epistasis) for all five ribozymes. We analyzed how changes in activity and trends in epistasis map to the ribozyme structures. The variety of structures studied provided opportunities to observe several examples of common structural elements, and the data was collected under identical experimental conditions to enable direct comparison. Heatmap-based visualization of the data revealed patterns indicating structural features of the ribozymes including paired regions, unpaired loops, non-canonical structures, and tertiary structural contacts. The data also revealed signatures of functionally critical nucleotides involved in catalysis. The results demonstrate that the data sets provide structural information similar to chemical or enzymatic probing experiments, but with additional quantitative functional information. The large-scale data sets can be used for models predicting structure and function and for efforts to engineer self-cleaving ribozymes.
Collapse
Affiliation(s)
- Jessica M Roberts
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
| | - James D Beck
- Computing PhD Program, Boise State UniversityBoiseUnited States
| | - Tanner B Pollock
- Department of Biological Science, Boise State UniversityBoiseUnited States
| | - Devin P Bendixsen
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
| | - Eric J Hayden
- Biomolecular Sciences Graduate Programs, Boise State UniversityBoiseUnited States
- Computing PhD Program, Boise State UniversityBoiseUnited States
- Department of Biological Science, Boise State UniversityBoiseUnited States
| |
Collapse
|
6
|
Marklund E, Ke Y, Greenleaf WJ. High-throughput biochemistry in RNA sequence space: predicting structure and function. Nat Rev Genet 2023; 24:401-414. [PMID: 36635406 DOI: 10.1038/s41576-022-00567-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 01/14/2023]
Abstract
RNAs are central to fundamental biological processes in all known organisms. The set of possible intramolecular interactions of RNA nucleotides defines the range of alternative structural conformations of a specific RNA that can coexist, and these structures enable functional catalytic properties of RNAs and/or their productive intermolecular interactions with other RNAs or proteins. However, the immense combinatorial space of potential RNA sequences has precluded predictive mapping between RNA sequence and molecular structure and function. Recent advances in high-throughput approaches in vitro have enabled quantitative thermodynamic and kinetic measurements of RNA-RNA and RNA-protein interactions, across hundreds of thousands of sequence variations. In this Review, we explore these techniques, how they can be used to understand RNA function and how they might form the foundations of an accurate model to predict the structure and function of an RNA directly from its nucleotide sequence. The experimental techniques and modelling frameworks discussed here are also highly relevant for the sampling of sequence-structure-function space of DNAs and proteins.
Collapse
Affiliation(s)
- Emil Marklund
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Yuxi Ke
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
7
|
Xiao H, Yang X, Zhang Y, Zhang Z, Zhang G, Zhang BT. RNA-targeted small-molecule drug discoveries: a machine-learning perspective. RNA Biol 2023; 20:384-397. [PMID: 37337437 PMCID: PMC10283424 DOI: 10.1080/15476286.2023.2223498] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2023] [Indexed: 06/21/2023] Open
Abstract
In the past two decades, machine learning (ML) has been extensively adopted in protein-targeted small molecule (SM) discovery. Once trained, ML models could exert their predicting abilities on large volumes of molecules within a short time. However, applying ML approaches to discover RNA-targeted SMs is still in its early stages. This is primarily because of the intrinsic structural instability of RNA molecules that impede the structure-based screening or designing of RNA-targeted SMs. Recently, with more studies revealing RNA structures and a growing number of RNA-targeted ligands being identified, it resulted in an increased interest in the field of drugging RNA. Undeniably, intracellular RNA is much more abundant than protein and, if successfully targeted, will be a major alternative target for therapeutics. Therefore, in this context, as well as under the premise of having RNA-related research data, ML-based methods can get involved in improving the speed of traditional experimental processes. [Figure: see text].
Collapse
Affiliation(s)
- Huan Xiao
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Xin Yang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Yihao Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Zongkang Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Ge Zhang
- Law Sau Fai Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
- Institute of Integrated Bioinformedicine and Translational Science, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
- Institute of Precision Medicine and Innovative Drug Discovery, HKBU Institute for Research and Continuing Education, Shenzhen, China
| | - Bao-Ting Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
8
|
Beck JD, Roberts JM, Kitzhaber JM, Trapp A, Serra E, Spezzano F, Hayden EJ. Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental data. Front Mol Biosci 2022; 9:893864. [PMID: 36046603 PMCID: PMC9421044 DOI: 10.3389/fmolb.2022.893864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can be designed to alter RNA processing and stability. However, the rational design of ribozyme activity remains challenging, and many ribozyme-based systems are engineered or improved by random mutagenesis and selection (in vitro evolution). Improving a ribozyme-based system often requires several mutations to achieve the desired function, but extensive pairwise and higher-order epistasis prevent a simple prediction of the effect of multiple mutations that is needed for rational design. Recently, high-throughput sequencing-based approaches have produced data sets on the effects of numerous mutations in different ribozymes (RNA fitness landscapes). Here we used such high-throughput experimental data from variants of the CPEB3 self-cleaving ribozyme to train a predictive model through machine learning approaches. We trained models using either a random forest or long short-term memory (LSTM) recurrent neural network approach. We found that models trained on a comprehensive set of pairwise mutant data could predict active sequences at higher mutational distances, but the correlation between predicted and experimentally observed self-cleavage activity decreased with increasing mutational distance. Adding sequences with increasingly higher numbers of mutations to the training data improved the correlation at increasing mutational distances. Systematically reducing the size of the training data set suggests that a wide distribution of ribozyme activity may be the key to accurate predictions. Because the model predictions are based only on sequence and activity data, the results demonstrate that this machine learning approach allows readily obtainable experimental data to be used for RNA design efforts even for RNA molecules with unknown structures. The accurate prediction of RNA functions will enable a more comprehensive understanding of RNA fitness landscapes for studying evolution and for guiding RNA-based engineering efforts.
Collapse
Affiliation(s)
| | - Jessica M. Roberts
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
| | - Joey M. Kitzhaber
- Department of Computer Science, Boise State University, Boise, ID, United States
| | - Ashlyn Trapp
- Department of Biological Sciences, Boise State University, Boise, ID, United States
| | | | | | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
- Department of Computer Science, Boise State University, Boise, ID, United States
- *Correspondence: Eric J. Hayden,
| |
Collapse
|
9
|
Fröhlking T, Mlýnský V, Janeček M, Kührová P, Krepl M, Banáš P, Šponer J, Bussi G. Automatic Learning of Hydrogen-Bond Fixes in the AMBER RNA Force Field. J Chem Theory Comput 2022; 18:4490-4502. [PMID: 35699952 PMCID: PMC9281393 DOI: 10.1021/acs.jctc.2c00200] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
![]()
The
capability of
current force fields to reproduce RNA structural
dynamics is limited. Several methods have been developed to take advantage
of experimental data in order to enforce agreement with experiments.
Here, we extend an existing framework which allows arbitrarily chosen
force-field correction terms to be fitted by quantification of the
discrepancy between observables back-calculated from simulation and
corresponding experiments. We apply a robust regularization protocol
to avoid overfitting and additionally introduce and compare a number
of different regularization strategies, namely, L1, L2, Kish size,
relative Kish size, and relative entropy penalties. The training set
includes a GACC tetramer as well as more challenging systems, namely,
gcGAGAgc and gcUUCGgc RNA tetraloops. Specific intramolecular hydrogen
bonds in the AMBER RNA force field are corrected with automatically
determined parameters that we call gHBfixopt. A validation
involving a separate simulation of a system present in the training
set (gcUUCGgc) and new systems not seen during training (CAAU and
UUUU tetramers) displays improvements regarding the native population
of the tetraloop as well as good agreement with NMR experiments for
tetramers when using the new parameters. Then, we simulate folded
RNAs (a kink–turn and L1 stalk rRNA) including hydrogen bond
types not sufficiently present in the training set. This allows a
final modification of the parameter set which is named gHBfix21 and
is suggested to be applicable to a wider range of RNA systems.
Collapse
Affiliation(s)
- Thorben Fröhlking
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, Trieste 34136, Italy
| | - Vojtěch Mlýnský
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, Brno 612 65, Czech Republic
| | - Michal Janeček
- Department of Physical Chemistry, Faculty of Science, Palacky University, tr. 17 listopadu 12, Olomouc 771 46, Czech Republic
| | - Petra Kührová
- Regional Centre of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacky University Olomouc, Slechtitelu 27, 779 00 Olomouc, Czech Republic
| | - Miroslav Krepl
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, Brno 612 65, Czech Republic.,Regional Centre of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacky University Olomouc, Slechtitelu 27, 779 00 Olomouc, Czech Republic
| | - Pavel Banáš
- Regional Centre of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacky University Olomouc, Slechtitelu 27, 779 00 Olomouc, Czech Republic
| | - Jiří Šponer
- Institute of Biophysics of the Czech Academy of Sciences, Kralovopolska 135, Brno 612 65, Czech Republic.,Regional Centre of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacky University Olomouc, Slechtitelu 27, 779 00 Olomouc, Czech Republic
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, Trieste 34136, Italy
| |
Collapse
|
10
|
Bugnon LA, Edera AA, Prochetto S, Gerard M, Raad J, Fenoy E, Rubiolo M, Chorostecki U, Gabaldón T, Ariel F, Di Persia LE, Milone DH, Stegmayer G. Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches. Brief Bioinform 2022; 23:6606044. [PMID: 35692094 DOI: 10.1093/bib/bbac205] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In contrast to messenger RNAs, the function of the wide range of existing long noncoding RNAs (lncRNAs) largely depends on their structure, which determines interactions with partner molecules. Thus, the determination or prediction of the secondary structure of lncRNAs is critical to uncover their function. Classical approaches for predicting RNA secondary structure have been based on dynamic programming and thermodynamic calculations. In the last 4 years, a growing number of machine learning (ML)-based models, including deep learning (DL), have achieved breakthrough performance in structure prediction of biomolecules such as proteins and have outperformed classical methods in short transcripts folding. Nevertheless, the accurate prediction for lncRNA still remains far from being effectively solved. Notably, the myriad of new proposals has not been systematically and experimentally evaluated. RESULTS In this work, we compare the performance of the classical methods as well as the most recently proposed approaches for secondary structure prediction of RNA sequences using a unified and consistent experimental setup. We use the publicly available structural profiles for 3023 yeast RNA sequences, and a novel benchmark of well-characterized lncRNA structures from different species. Moreover, we propose a novel metric to assess the predictive performance of methods, exclusively based on the chemical probing data commonly used for profiling RNA structures, avoiding any potential bias incorporated by computational predictions when using dot-bracket references. Our results provide a comprehensive comparative assessment of existing methodologies, and a novel and public benchmark resource to aid in the development and comparison of future approaches. AVAILABILITY Full source code and benchmark datasets are available at: https://github.com/sinc-lab/lncRNA-folding. CONTACT lbugnon@sinc.unl.edu.ar.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - A A Edera
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - S Prochetto
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina.,IAL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina
| | - M Gerard
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - E Fenoy
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - M Rubiolo
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - U Chorostecki
- Barcelona Supercomputing Center (BSC-CNS), Institute of Research in Biomedicine (IRB), Spain
| | - T Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), Institute of Research in Biomedicine (IRB), Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.,Centro de Investigación Biomédica En Red de Enfermedades Infecciosas (CIBERINFEC), Barcelona, Spain
| | - F Ariel
- IAL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina
| | - L E Di Persia
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
11
|
Yu H, Qi Y, Ding Y. Deep Learning in RNA Structure Studies. Front Mol Biosci 2022; 9:869601. [PMID: 35677883 PMCID: PMC9168262 DOI: 10.3389/fmolb.2022.869601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/04/2022] [Indexed: 01/27/2023] Open
Abstract
Deep learning, or artificial neural networks, is a type of machine learning algorithm that can decipher underlying relationships from large volumes of data and has been successfully applied to solve structural biology questions, such as RNA structure. RNA can fold into complex RNA structures by forming hydrogen bonds, thereby playing an essential role in biological processes. While experimental effort has enabled resolving RNA structure at the genome-wide scale, deep learning has been more recently introduced for studying RNA structure and its functionality. Here, we discuss successful applications of deep learning to solve RNA problems, including predictions of RNA structures, non-canonical G-quadruplex, RNA-protein interactions and RNA switches. Following these cases, we give a general guide to deep learning for solving RNA structure problems.
Collapse
Affiliation(s)
- Haopeng Yu
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| | | | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
| |
Collapse
|
12
|
Gianti E, Percec S. Machine Learning at the Interface of Polymer Science and Biology: How Far Can We Go? Biomacromolecules 2022; 23:576-591. [PMID: 35133143 DOI: 10.1021/acs.biomac.1c01436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This Perspective outlines recent progress and future directions for using machine learning (ML), a data-driven method, to address critical questions in the design, synthesis, processing, and characterization of biomacromolecules. The achievement of these tasks requires the navigation of vast and complex chemical and biological spaces, difficult to accomplish with reasonable speed. Using modern algorithms and supercomputers, quantum physics methods are able to examine systems containing a few hundred interacting species and determine the probability of finding them in a particular region of phase space, thereby anticipating their properties. Likewise, modern approaches in chemistry and biomolecular simulation, supported by high performance computing, have culminated in producing data sets of escalating size and intrinsically high complexity. Hence, using ML to extract relevant information from these fields is of paramount importance to advance our understanding of chemical and biomolecular systems. At the heart of ML approaches lie statistical algorithms, which by evaluating a portion of a given data set, identify, learn, and manipulate the underlying rules that govern the whole data set. The assembly of a quality model to represent the data followed by the predictions and elimination of error sources are the key steps in ML. In addition to a growing infrastructure of ML tools to address complex problems, an increasing number of aspects related to our understanding of the fundamental properties of biomacromolecules are exposed to ML. These fields, including those residing at the interface of polymer science and biology (i.e., structure determination, de novo design, folding, and dynamics), strive to adopt and take advantage of the transformative power offered by approaches in the ML domain, which clearly has the potential of accelerating research in the field of biomacromolecules.
Collapse
Affiliation(s)
- Eleonora Gianti
- Institute for Computational Molecular Science (ICMS), Temple University, Philadelphia, Pennsylvania 19122, United States.,Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Simona Percec
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
13
|
Liu Z, Yang Y, Li D, Lv X, Chen X, Dai Q. Prediction of the RNA Tertiary Structure Based on a Random Sampling Strategy and Parallel Mechanism. Front Genet 2022; 12:813604. [PMID: 35069706 PMCID: PMC8769045 DOI: 10.3389/fgene.2021.813604] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/19/2021] [Indexed: 12/14/2022] Open
Abstract
Background: Macromolecule structure prediction remains a fundamental challenge of bioinformatics. Over the past several decades, the Rosetta framework has provided solutions to diverse challenges in computational biology. However, it is challenging to model RNA tertiary structures effectively when the de novo modeling of RNA involves solving a well-defined small puzzle. Methods: In this study, we introduce a stepwise Monte Carlo parallelization (SMCP) algorithm for RNA tertiary structure prediction. Millions of conformations were randomly searched using the Monte Carlo algorithm and stepwise ansatz hypothesis, and SMCP uses a parallel mechanism for efficient sampling. Moreover, to achieve better prediction accuracy and completeness, we judged and processed the modeling results. Results: A benchmark of nine single-stranded RNA loops drawn from riboswitches establishes the general ability of the algorithm to model RNA with high accuracy and integrity, including six motifs that cannot be solved by knowledge mining-based modeling algorithms. Experimental results show that the modeling accuracy of the SMCP algorithm is up to 0.14 Å, and the modeling integrity on this benchmark is extremely high. Conclusion: SMCP is an ab initio modeling algorithm that substantially outperforms previous algorithms in the Rosetta framework, especially in improving the accuracy and completeness of the model. It is expected that the work will provide new research ideas for macromolecular structure prediction in the future. In addition, this work will provide theoretical basis for the development of the biomedical field.
Collapse
Affiliation(s)
- Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Yurong Yang
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Dongyan Li
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Xinrong Lv
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Xi Chen
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, Beijing, China
| |
Collapse
|
14
|
Romero-López C, Ramos-Lorente SE, Berzal-Herranz A. In Vitro Methods to Decipher the Structure of Viral RNA Genomes. Pharmaceuticals (Basel) 2021; 14:1192. [PMID: 34832974 PMCID: PMC8620418 DOI: 10.3390/ph14111192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 02/05/2023] Open
Abstract
RNA viruses encode essential information in their genomes as conserved structural elements that are involved in efficient viral protein synthesis, replication, and encapsidation. These elements can also establish complex networks of RNA-RNA interactions, the so-called RNA interactome, to shape the viral genome and control different events during intracellular infection. In recent years, targeting these conserved structural elements has become a promising strategy for the development of new antiviral tools due to their sequence and structural conservation. In this context, RNA-based specific therapeutic strategies, such as the use of siRNAs have been extensively pursued to target the genome of different viruses. Importantly, siRNA-mediated targeting is not a straightforward approach and its efficiency is highly dependent on the structure of the target region. Therefore, the knowledge of the viral structure is critical for the identification of potentially good target sites. Here, we describe detailed protocols used in our laboratory for the in vitro study of the structure of viral RNA genomes. These protocols include DMS (dimethylsulfate) probing, SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) analysis, and HMX (2'-hydroxyl molecular interference). These methodologies involve the use of high-throughput analysis techniques that provide extensive information about the 3D folding of the RNA under study and the structural tuning derived from the interactome activity. They are therefore a good tool for the development of new RNA-based antiviral compounds.
Collapse
Affiliation(s)
- Cristina Romero-López
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN-CSIC), Av. del Conocimiento 17, 18016 Armilla, Granada, Spain;
| | | | - Alfredo Berzal-Herranz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN-CSIC), Av. del Conocimiento 17, 18016 Armilla, Granada, Spain;
| |
Collapse
|
15
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|