1
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
2
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
3
|
Zhang D, Gong L, Weng J, Li Y, Wang A, Li G. RNA Folding Based on 5 Beads Model and Multiscale Simulation. Interdiscip Sci 2023:10.1007/s12539-023-00561-3. [PMID: 37115389 DOI: 10.1007/s12539-023-00561-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 03/08/2023] [Accepted: 03/10/2023] [Indexed: 04/29/2023]
Abstract
RNA folding prediction is very meaningful and challenging. The molecular dynamics simulation (MDS) of all atoms (AA) is limited to the folding of small RNA molecules. At present, most of the practical models are coarse grained (CG) model, and the coarse-grained force field (CGFF) parameters usually depend on known RNA structures. However, the limitation of the CGFF is obvious that it is difficult to study the modified RNA. Based on the 3 beads model (AIMS_RNA_B3), we proposed the AIMS_RNA_B5 model with three beads representing a base and two beads representing the main chain (sugar group and phosphate group). We first run the all atom molecular dynamic simulation (AAMDS), and fit the CGFF parameter with the AA trajectory. Then perform the coarse-grained molecular dynamic simulation (CGMDS). AAMDS is the foundation of CGMDS. CGMDS is mainly to carry out the conformation sampling based on the current AAMDS state and improve the folding speed. We simulated the folding of three RNAs, which belong to hairpin, pseudoknot and tRNA respectively. Compared to the AIMS_RNA_B3 model, the AIMS_RNA_B5 model is more reasonable and performs better.
Collapse
Affiliation(s)
- Dinglin Zhang
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
- Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lidong Gong
- School of Chemistry and Chemical Engineering, Liaoning Normal University, Dalian, 116029, China
| | - Junben Weng
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
- Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yan Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Anhui Wang
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China.
| |
Collapse
|
4
|
Zhang D, Li Y, Zhong Q, Wang A, Weng J, Gong L, Li G. Ribonucleic Acid Folding Prediction Based on Iterative Multiscale Simulation. J Phys Chem Lett 2022; 13:9957-9966. [PMID: 36260782 DOI: 10.1021/acs.jpclett.2c01342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
RNA folding prediction is a challenge. Currently, many RNA folding models are coarse-grained (CG) with the potential derived from the known RNA structures. However, this potential is not suitable for modified and entirely new RNA. It is also not suitable for the folding simulation of RNA in the real cellular environment, including many kinds of molecular interactions. In contrast, our proposed model has the potential to address these issues, which is a multiscale simulation scheme based on all-atom (AA) force fields. We fit the CG force field using the trajectories generated by the AA force field and then iteratively perform molecular dynamics (MD) simulations of the two scales. The all-atom molecular dynamics (AAMD) simulation is mainly responsible for the correction of RNA structure, and the CGMD simulation is mainly responsible for efficient conformational sampling. On the basis of this scheme, we can successfully fold three RNAs belonging to a hairpin, a pseudoknot, and a four-way junction.
Collapse
Affiliation(s)
- Dinglin Zhang
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
- Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing100049, P. R. China
| | - Yan Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
| | - Qinglu Zhong
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
| | - Anhui Wang
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
| | - Junben Weng
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
- Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing100049, P. R. China
| | - Lidong Gong
- School of Chemistry and Chemical Engineering, Liaoning Normal University, Dalian116029, China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian116023, P. R. China
| |
Collapse
|
5
|
RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods 2022; 19:1234-1242. [PMID: 36192461 PMCID: PMC9839360 DOI: 10.1038/s41592-022-01605-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/10/2022] [Indexed: 01/17/2023]
Abstract
Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.
Collapse
|
6
|
Fei Y, Zhang H, Wang Y, Liu Z, Liu Y. LTPConstraint: a transfer learning based end-to-end method for RNA secondary structure prediction. BMC Bioinformatics 2022; 23:354. [PMID: 35999499 PMCID: PMC9396797 DOI: 10.1186/s12859-022-04847-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/18/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND RNA secondary structure is very important for deciphering cell's activity and disease occurrence. The first method which was used by the academics to predict this structure is biological experiment, But this method is too expensive, causing the promotion to be affected. Then, computing methods emerged, which has good efficiency and low cost. However, the accuracy of computing methods are not satisfactory. Many machine learning methods have also been applied to this area, but the accuracy has not improved significantly. Deep learning has matured and achieves great success in many areas such as computer vision and natural language processing. It uses neural network which is a kind of structure that has good functionality and versatility, but its effect is highly correlated with the quantity and quality of the data. At present, there is no model with high accuracy, low data dependence and high convenience in predicting RNA secondary structure. RESULTS This paper designs a neural network called LTPConstraint to predict RNA secondary structure. The network is based on many network structure such as Bidirectional LSTM, Transformer and generator. It also uses transfer learning to train modelso that the data dependence can be reduced. CONCLUSIONS LTPConstraint has achieved high accuracy in RNA secondary structure prediction. Compared with the previous methods, the accuracy improves obviously both in predicting the structure with pseudoknot and the structure without pseudoknot. At the same time, LTPConstraint is easy to operate and can achieve result very quickly.
Collapse
Affiliation(s)
- Yinchao Fei
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Yili Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Zhen Liu
- Graduate School of Engineering, Nagasaki Institute of Applied Science, Nagasaki, Japan
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China.
| |
Collapse
|
7
|
Aviran S, Incarnato D. Computational approaches for RNA structure ensemble deconvolution from structure probing data. J Mol Biol 2022; 434:167635. [PMID: 35595163 DOI: 10.1016/j.jmb.2022.167635] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 04/29/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022]
Abstract
RNA structure probing experiments have emerged over the last decade as a straightforward way to determine the structure of RNA molecules in a number of different contexts. Although powerful, the ability of RNA to dynamically interconvert between, and to simultaneously populate, alternative structural configurations, poses a nontrivial challenge to the interpretation of data derived from these experiments. Recent efforts aimed at developing computational methods for the reconstruction of coexisting alternative RNA conformations from structure probing data are paving the way to the study of RNA structure ensembles, even in the context of living cells. In this review, we critically discuss these methods, their limitations and possible future improvements.
Collapse
Affiliation(s)
- Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA.
| | - Danny Incarnato
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, the Netherlands.
| |
Collapse
|
8
|
Guo ZH, Yuan L, Tan YL, Zhang BG, Shi YZ. RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures. FRONTIERS IN BIOINFORMATICS 2022; 1:809082. [PMID: 36303785 PMCID: PMC9580920 DOI: 10.3389/fbinf.2021.809082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 12/17/2021] [Indexed: 11/13/2022] Open
Abstract
The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).
Collapse
Affiliation(s)
- Zhi-Hao Guo
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Li Yuan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
| | - Ben-Gong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
| | - Ya-Zhou Shi
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
- *Correspondence: Ya-Zhou Shi,
| |
Collapse
|
9
|
De Bisschop G, Allouche D, Frezza E, Masquida B, Ponty Y, Will S, Sargueil B. Progress toward SHAPE Constrained Computational Prediction of Tertiary Interactions in RNA Structure. Noncoding RNA 2021; 7:71. [PMID: 34842779 PMCID: PMC8628965 DOI: 10.3390/ncrna7040071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/29/2021] [Accepted: 11/02/2021] [Indexed: 01/04/2023] Open
Abstract
As more sequencing data accumulate and novel puzzling genetic regulations are discovered, the need for accurate automated modeling of RNA structure increases. RNA structure modeling from chemical probing experiments has made tremendous progress, however accurately predicting large RNA structures is still challenging for several reasons: RNA are inherently flexible and often adopt many energetically similar structures, which are not reliably distinguished by the available, incomplete thermodynamic model. Moreover, computationally, the problem is aggravated by the relevance of pseudoknots and non-canonical base pairs, which are hardly predicted efficiently. To identify nucleotides involved in pseudoknots and non-canonical interactions, we scrutinized the SHAPE reactivity of each nucleotide of the 188 nt long lariat-capping ribozyme under multiple conditions. Reactivities analyzed in the light of the X-ray structure were shown to report accurately the nucleotide status. Those that seemed paradoxical were rationalized by the nucleotide behavior along molecular dynamic simulations. We show that valuable information on intricate interactions can be deduced from probing with different reagents, and in the presence or absence of Mg2+. Furthermore, probing at increasing temperature was remarkably efficient at pointing to non-canonical interactions and pseudoknot pairings. The possibilities of following such strategies to inform structure modeling software are discussed.
Collapse
Affiliation(s)
- Grégoire De Bisschop
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, QC H2W 1R7, Canada
| | - Delphine Allouche
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
- Institut Necker-Enfants Malades (INEM), Inserm U1151, 156 rue de Vaugirard, CEDEX 15, 75015 Paris, France
| | - Elisa Frezza
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
| | - Benoît Masquida
- Université de Strasbourg, CNRS UMR7156 GMGM, 67084 Strasbourg, France;
| | - Yann Ponty
- Ecole Polytechnique, CNRS UMR 7161, LIX, 91120 Palaiseau, France; (Y.P.); (S.W.)
| | - Sebastian Will
- Ecole Polytechnique, CNRS UMR 7161, LIX, 91120 Palaiseau, France; (Y.P.); (S.W.)
| | - Bruno Sargueil
- Université de Paris, CNRS, UMR 8038/CiTCoM, F-75006 Paris, France; (G.D.B.); (D.A.); (E.F.)
| |
Collapse
|
10
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|
11
|
Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA Secondary Structure and Tertiary Base-pairing Prediction Using Evolutionary Profile, Mutational Coupling and Two-dimensional Transfer Learning. Bioinformatics 2021; 37:2589-2600. [PMID: 33704363 DOI: 10.1093/bioinformatics/btab165] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/05/2021] [Accepted: 03/08/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
12
|
Chillón I, Marcia M. The molecular structure of long non-coding RNAs: emerging patterns and functional implications. Crit Rev Biochem Mol Biol 2020; 55:662-690. [PMID: 33043695 DOI: 10.1080/10409238.2020.1828259] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) are recently-discovered transcripts that regulate vital cellular processes and are crucially connected to diseases. Despite their unprecedented molecular complexity, it is emerging that lncRNAs possess distinct structural motifs. Remarkably, the 3D shape and topology of full-length, native lncRNAs have been visualized for the first time in the last year. These studies reveal that lncRNA structures dictate lncRNA functions. Here, we review experimentally determined lncRNA structures and emphasize that lncRNA structural characterization requires synergistic integration of computational, biochemical and biophysical approaches. Based on these emerging paradigms, we discuss how to overcome the challenges posed by the complex molecular architecture of lncRNAs, with the goal of obtaining a detailed understanding of lncRNA functions and molecular mechanisms in the future.
Collapse
Affiliation(s)
- Isabel Chillón
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| | - Marco Marcia
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| |
Collapse
|
13
|
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019; 10:5407. [PMID: 31776342 PMCID: PMC6881452 DOI: 10.1038/s41467-019-13395-9] [Citation(s) in RCA: 181] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only [Formula: see text]250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of [Formula: see text]10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jack Hanson
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4222, Australia.
| |
Collapse
|
14
|
Antczak M, Zablocki M, Zok T, Rybarczyk A, Blazewicz J, Szachniuk M. RNAvista: a webserver to assess RNA secondary structures with non-canonical base pairs. Bioinformatics 2019; 35:152-155. [PMID: 29985979 PMCID: PMC6298044 DOI: 10.1093/bioinformatics/bty609] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 07/06/2018] [Indexed: 01/23/2023] Open
Abstract
Motivation In the study of 3D RNA structure, information about non-canonical interactions between nucleobases is increasingly important. Specialized databases support investigation of this issue based on experimental data, and several programs can annotate non-canonical base pairs in the RNA 3D structure. However, predicting the extended RNA secondary structure which describes both canonical and non-canonical interactions remains difficult. Results Here, we present RNAvista that allows predicting an extended RNA secondary structure from sequence or from the list enumerating canonical base pairs only. RNAvista is implemented as a publicly available webserver with user-friendly interface. It runs on all major web browsers. Availability and implementation http://rnavista.cs.put.poznan.pl
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marcin Zablocki
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Agnieszka Rybarczyk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
15
|
Abstract
RNA performs and regulates a diverse range of cellular processes, with new functional roles being uncovered at a rapid pace. Interest is growing in how these functions are linked to RNA structures that form in the complex cellular environment. A growing suite of technologies that use advances in RNA structural probes, high-throughput sequencing and new computational approaches to interrogate RNA structure at unprecedented throughput are beginning to provide insights into RNA structures at new spatial, temporal and cellular scales.
Collapse
Affiliation(s)
- Eric J Strobel
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Angela M Yu
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Julius B Lucks
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
16
|
Braun J, Fischer S, Xu ZZ, Sun H, Ghoneim DH, Gimbel AT, Plessmann U, Urlaub H, Mathews DH, Weigand JE. Identification of new high affinity targets for Roquin based on structural conservation. Nucleic Acids Res 2019; 46:12109-12125. [PMID: 30295819 PMCID: PMC6294493 DOI: 10.1093/nar/gky908] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 10/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional gene regulation controls the amount of protein produced from a specific mRNA by altering both its decay and translation rates. Such regulation is primarily achieved by the interaction of trans-acting factors with cis-regulatory elements in the untranslated regions (UTRs) of mRNAs. These interactions are guided either by sequence- or structure-based recognition. Similar to sequence conservation, the evolutionary conservation of a UTR’s structure thus reflects its functional importance. We used such structural conservation to identify previously unknown cis-regulatory elements. Using the RNA folding program Dynalign, we scanned all UTRs of humans and mice for conserved structures. Characterizing a subset of putative conserved structures revealed a binding site of the RNA-binding protein Roquin. Detailed functional characterization in vivo enabled us to redefine the binding preferences of Roquin and identify new target genes. Many of these new targets are unrelated to the established role of Roquin in inflammation and immune responses and thus highlight additional, unstudied cellular functions of this important repressor. Moreover, the expression of several Roquin targets is highly cell-type-specific. In consequence, these targets are difficult to detect using methods dependent on mRNA abundance, yet easily detectable with our unbiased strategy.
Collapse
Affiliation(s)
- Johannes Braun
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Sandra Fischer
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Zhenjiang Z Xu
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Hongying Sun
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Dalia H Ghoneim
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Anna T Gimbel
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| | - Uwe Plessmann
- Biophysical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Henning Urlaub
- Biophysical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany.,Bioanalytics, Institute for Clinical Chemistry, University Medical Center, 37073 Göttingen, Germany
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Julia E Weigand
- Department of Biology, Technische Universität Darmstadt, Darmstadt 64287, Germany
| |
Collapse
|
17
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
18
|
Bellaousov S, Kayedkhordeh M, Peterson RJ, Mathews DH. Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA (NEW YORK, N.Y.) 2018; 24:1555-1567. [PMID: 30097542 PMCID: PMC6191713 DOI: 10.1261/rna.066324.118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
19
|
Xiong Y, McQuistan TJ, Stanek JW, Summerton JE, Mata JE, Squier TC. Detection of unique Ebola virus oligonucleotides using fluorescently-labeled phosphorodiamidate morpholino oligonucleotide probe pairs. Anal Biochem 2018; 557:84-90. [PMID: 30030994 DOI: 10.1016/j.ab.2018.07.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022]
Abstract
Here we identify a low-cost diagnostic platform using fluorescently-labeled phosphorodiamidate morpholino oligonucleotide (PMO) probe pairs, which upon binding target oligonucleotides undergo fluorescence resonance energy transfer (FRET). Using a target oligonucleotide derived from the Ebola virus (EBOV), we have derivatized PMO probes with either Alexa Fluor488 (donor) or tetramethylrhodamine (acceptor). Upon EBOV target oligonulceotide binding, observed changes in FRET between PMO probe pairs permit a 25 pM lower limit of detection; there is no off-target binding within a complex mixture of nucleic acids and other biomolecules present in human saliva. Equivalent levels of FRET occur using PMO probe pairs for single or double stranded oligonucleotide targets. High-affinity binding is retained under low-ionic strength conditions that disrupt oligonucleotide secondary structures (e.g., stem-loop structures), ensuring reliable target detection. Under these low-ionic strength conditions, rates of PMO probe binding to target oligonucleotides are increased 3-fold relative to conventional high-ionic strength conditions used for nucleic acid hybridization, with half-maximal binding occurring within 10 min. Our results indicate an ability to use PMO probe pairs to detect clinically relevant levels of EBOV and other oligonucleotide targets in complex biological samples without the need for nucleic acid amplification, and open the possibility of population screening that includes assays for the genomic integration of DNA based copies of viral RNA.
Collapse
Affiliation(s)
- Yijia Xiong
- Department of Basic Medical Sciences, Western University of Health Sciences, 200 Mullins Drive, Lebanon, OR, 97355, United States
| | - Tammie J McQuistan
- Department of Basic Medical Sciences, Western University of Health Sciences, 200 Mullins Drive, Lebanon, OR, 97355, United States
| | - James W Stanek
- Department of Basic Medical Sciences, Western University of Health Sciences, 200 Mullins Drive, Lebanon, OR, 97355, United States
| | - James E Summerton
- Gene Tools, LLC, One Summerton Way, Philomath, OR, 97370, United States
| | - John E Mata
- Department of Basic Medical Sciences, Western University of Health Sciences, 200 Mullins Drive, Lebanon, OR, 97355, United States; Takena Technologies Inc, 405 West First Street, Albany, OR, 97321, United States
| | - Thomas C Squier
- Department of Basic Medical Sciences, Western University of Health Sciences, 200 Mullins Drive, Lebanon, OR, 97355, United States.
| |
Collapse
|