1
|
Gray M, Trinity L, Stege U, Ponty Y, Will S, Jabbari H. CParty: hierarchically constrained partition function of RNA pseudoknots. Bioinformatics 2024; 41:btae748. [PMID: 39700413 PMCID: PMC11709253 DOI: 10.1093/bioinformatics/btae748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 11/28/2024] [Accepted: 12/17/2024] [Indexed: 12/21/2024] Open
Abstract
MOTIVATION Biologically relevant RNA secondary structures are routinely predicted by efficient dynamic programming algorithms that minimize their free energy. Starting from such algorithms, one can devise partition function algorithms, which enable stochastic perspectives on RNA structure ensembles. As the most prominent example, McCaskill's partition function algorithm is derived from pseudoknot-free energy minimization. While this algorithm became hugely successful for the analysis of pseudoknot-free RNA structure ensembles, as of yet there exists only one pseudoknotted partition function implementation, which covers only simple pseudoknots and comes with a borderline-prohibitive complexity of O(n5) in the RNA length n. RESULTS Here, we develop a partition function algorithm corresponding to the hierarchical pseudoknot prediction of HFold, which performs exact optimization in a realistic pseudoknot energy model. In consequence, our algorithm CParty carries over HFold's advantages over classical pseudoknot prediction in characterizing the Boltzmann ensemble at equilibrium. Given an RNA sequence S and a pseudoknot-free structure G, CParty computes the partition function over all possibly pseudoknotted density-2 structures G∪G' of S that extend the fixed G by a disjoint pseudoknot-free structure G'. Thus, CParty follows the common hypothesis of hierarchical pseudoknot formation, where pseudoknots form as tertiary contacts only after a first pseudoknot-free "core" G and we call the computed partition function hierarchically constrained (by G). Like HFold, the dynamic programming algorithm CParty is very efficient, achieving the low complexity of the pseudoknot-free algorithm, i.e. cubic time and quadratic space. Finally, by computing pseudoknotted ensemble energies, we unveil kinetics features of a therapeutic target in SARS-CoV-2. AVAILABILITY AND IMPLEMENTATION CParty is available at https://github.com/HosnaJabbari/CParty.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada
| | - Luke Trinity
- Department of Computer Science, University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Ulrike Stege
- Department of Computer Science, University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Yann Ponty
- Institut Polytechnique de Paris, 91120 Palaiseau, Paris, France
| | - Sebastian Will
- Institut Polytechnique de Paris, 91120 Palaiseau, Paris, France
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada
| |
Collapse
|
2
|
Qi F, Chen J, Chen Y, Sun J, Lin Y, Chen Z, Kapranov P. Evaluating Performance of Different RNA Secondary Structure Prediction Programs Using Self-cleaving Ribozymes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae043. [PMID: 39317944 PMCID: PMC12016570 DOI: 10.1093/gpbjnl/qzae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/02/2024] [Accepted: 06/05/2024] [Indexed: 09/26/2024]
Abstract
Accurate identification of the correct, biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes. Thus, a plethora of approaches have been developed to predict, identify, or solve RNA structures based on various computational, molecular, genetic, chemical, or physicochemical strategies. Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation, time, speed, cost, and throughput, but they strongly underperform in terms of accuracy that significantly limits their broader application. Nonetheless, the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs. Here, we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity. We found that while many programs performed well in relatively simple tasks, their performance varied significantly in more complex RNA folding problems. However, in general, a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures, at least based on the specific class of sequences tested, suggesting that it may represent the future of RNA structure prediction algorithms.
Collapse
Affiliation(s)
- Fei Qi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Junjie Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Yue Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, United Kingdom
| | - Yiting Lin
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Zipeng Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
3
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
4
|
Newman T, Chang HFK, Jabbari H. DinoKnot: Duplex Interaction of Nucleic Acids With PseudoKnots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:348-359. [PMID: 38345958 DOI: 10.1109/tcbb.2024.3362308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Interaction of nucleic acid molecules is essential for their functional roles in the cell and their applications in biotechnology. While simple duplex interactions have been studied before, the problem of efficiently predicting the minimum free energy structure of more complex interactions with possibly pseudoknotted structures remains a challenge. In this work, we introduce a novel and efficient algorithm for prediction of Duplex Interaction of Nucleic acids with pseudoKnots, DinoKnot follows the hierarchical folding hypothesis to predict the secondary structure of two interacting nucleic acid strands (both homo- and hetero-dimers). DinoKnot utilizes the structure of molecules before interaction as a guide to find their duplex structure allowing for possible base pair competitions. To showcase DinoKnots's capabilities we evaluated its predicted structures against (1) experimental results for SARS-CoV-2 genome and nine primer-probe sets, (2) a clinically verified example of a mutation affecting detection, and (3) a known nucleic acid interaction involving a pseudoknot. In addition, we compared our results against our closest competition, RNAcofold, further highlighting DinoKnot's strengths. We believe DinoKnot can be utilized for various applications including screening new variants for potential detection issues and supporting existing applications involving DNA/RNA interactions, adding structural considerations to the interaction to elicit functional information.
Collapse
|
5
|
Yao HT, Marchand B, Berkemer SJ, Ponty Y, Will S. Infrared: a declarative tree decomposition-powered framework for bioinformatics. Algorithms Mol Biol 2024; 19:13. [PMID: 38493130 PMCID: PMC10943887 DOI: 10.1186/s13015-024-00258-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 02/13/2024] [Indexed: 03/18/2024] Open
Abstract
MOTIVATION Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. METHODS We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency. RESULTS Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations. AVAILABILITY Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.
Collapse
Affiliation(s)
- Hua-Ting Yao
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria.
- School of Computer Science, McGill University, Montreal, Canada.
| | - Bertrand Marchand
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sarah J Berkemer
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
| | - Yann Ponty
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sebastian Will
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| |
Collapse
|
6
|
Gong T, Ju F, Bu D. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials. Commun Biol 2024; 7:297. [PMID: 38461362 PMCID: PMC10924946 DOI: 10.1038/s42003-024-05952-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/11/2024] Open
Abstract
Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.
Collapse
Affiliation(s)
- Tiansu Gong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
- University of Chinese Academy of Sciences, 100190, Beijing, China.
- Central China Artificial Intelligence Research Institute, Henan Academy of Sciences, Zhengzhou, 450046, Henan, China.
| |
Collapse
|
7
|
Gray M, Will S, Jabbari H. SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration. Algorithms Mol Biol 2024; 19:9. [PMID: 38433200 PMCID: PMC11289965 DOI: 10.1186/s13015-024-00256-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/13/2024] [Indexed: 03/05/2024] Open
Abstract
MOTIVATION Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( O ( n 3 ) time and O ( n 2 ) space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements. RESULTS Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases. CONCLUSION Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Biomedical Engineering, University of Alberta, Street, Edmonton, T6G2R3, AB, Canada.
| | - Sebastian Will
- Department of Computer Science CNRS/LIX (UMR 7161), Institut Polytechnique de Paris, Street, Paris, 10587, France
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Street, Edmonton, T6G2R3, AB, Canada.
| |
Collapse
|
8
|
Loyer G, Reinharz V. Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework. Bioinformatics 2024; 40:btae022. [PMID: 38230755 PMCID: PMC10868335 DOI: 10.1093/bioinformatics/btae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/30/2023] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. RESULTS We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. AVAILABILITY AND IMPLEMENTATION The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.
Collapse
Affiliation(s)
- Gabriel Loyer
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| |
Collapse
|
9
|
Peterson JM, O'Leary CA, Coppenbarger EC, Tompkins VS, Moss WN. Discovery of RNA secondary structural motifs using sequence-ordered thermodynamic stability and comparative sequence analysis. MethodsX 2023; 11:102275. [PMID: 37448951 PMCID: PMC10336498 DOI: 10.1016/j.mex.2023.102275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/28/2023] [Indexed: 07/18/2023] Open
Abstract
Major advances in RNA secondary structural motif prediction have been achieved in the last few years; however, few methods harness the predictive power of multiple approaches to deliver in-depth characterizations of local RNA motifs and their potential functionality. Additionally, most available methods do not predict RNA pseudoknots. This work combines complementary bioinformatic systems into one robust discovery pipeline where: •RNA sequences are folded to search for thermodynamically favorable motifs utilizing ScanFold.•Motifs are expanded and refolded into alternate pseudoknot conformations by Knotty/Iterative HFold.•All conformations are evaluated for covariance via the cm-builder pipeline (Infernal and R-scape).
Collapse
|
10
|
Nasaev SS, Mukanov AR, Kuznetsov II, Veselovsky AV. AliNA - a deep learning program for RNA secondary structure prediction. Mol Inform 2023; 42:e202300113. [PMID: 37710142 DOI: 10.1002/minf.202300113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/16/2023]
Abstract
Nowadays there are numerous discovered natural RNA variations participating in different cellular processes and artificial RNA, e. g., aptamers, riboswitches. One of the required tasks in the investigation of their functions and mechanism of influence on cells and interaction with targets is the prediction of RNA secondary structures. The classic thermodynamic-based prediction algorithms do not consider the specificity of biological folding and deep learning methods that were designed to resolve this issue suffer from homology-based methods problems. Herein, we present a method for RNA secondary structure prediction based on deep learning - AliNA (ALIgned Nucleic Acids). Our method successfully predicts secondary structures for non-homologous to train-data RNA families thanks to usage of the data augmentation techniques. Augmentation extends existing datasets with easily-accessible simulated data. The proposed method shows a high quality of prediction across different benchmarks including pseudoknots. The method is available on GitHub for free (https://github.com/Arty40m/AliNA).
Collapse
Affiliation(s)
- Shamsudin S Nasaev
- Institute of Biomedical Chemistry, 10, Pogodinskaya str., 119121, Moscow, Russia
| | - Artem R Mukanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ivan I Kuznetsov
- Moscow University of Finance and Law, 10 block 1, Serpuhovsky val str., 115191, Moscow, Russia
| | | |
Collapse
|
11
|
Marchand B, Will S, Berkemer SJ, Ponty Y, Bulteau L. Automated design of dynamic programming schemes for RNA folding with pseudoknots. Algorithms Mol Biol 2023; 18:18. [PMID: 38041153 PMCID: PMC10691146 DOI: 10.1186/s13015-023-00229-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/10/2023] [Indexed: 12/03/2023] Open
Abstract
Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes. In contrast, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. For this purpose, we formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the treewidth tw of the fatgraph, and its output represents a [Formula: see text] algorithm (and even possibly [Formula: see text] in simple energy models) for predicting the MFE folding of an RNA of length n. We demonstrate, for the most common pseudoknot classes, that our automatically generated algorithms achieve the same complexities as reported in the literature for hand-crafted schemes. Our framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.
Collapse
Affiliation(s)
- Bertrand Marchand
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- LIGM, CNRS, University Gustave Eiffel, F77454, Marne-la-Vallée, France
| | - Sebastian Will
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sarah J Berkemer
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- Earth-Life Science Institute, Tokyo Institute of Technology 2-12-1-I7E-318, Ookayama, Tokyo, 152-8550, Japan
| | - Yann Ponty
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| | - Laurent Bulteau
- LIGM, CNRS, University Gustave Eiffel, F77454, Marne-la-Vallée, France
| |
Collapse
|
12
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
13
|
Makris E, Kolaitis A, Andrikos C, Moulos V, Tsanakas P, Pavlatos C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules 2023; 13:biom13020308. [PMID: 36830677 PMCID: PMC9953189 DOI: 10.3390/biom13020308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/25/2023] [Accepted: 02/01/2023] [Indexed: 02/09/2023] Open
Abstract
The accurate "base pairing" in RNA molecules, which leads to the prediction of RNA secondary structures, is crucial in order to explain unknown biological operations. Recently, COVID-19, a widespread disease, has caused many deaths, affecting humanity in an unprecedented way. SARS-CoV-2, a single-stranded RNA virus, has shown the significance of analyzing these molecules and their structures. This paper aims to create a pioneering framework in the direction of predicting specific RNA structures, leveraging syntactic pattern recognition. The proposed framework, Knotify+, addresses the problem of predicting H-type pseudoknots, including bulges and internal loops, by featuring the power of context-free grammar (CFG). We combine the grammar's advantages with maximum base pairing and minimum free energy to tackle this ambiguous task in a performant way. Specifically, our proposed methodology, Knotify+, outperforms state-of-the-art frameworks with regards to its accuracy in core stems prediction. Additionally, it performs more accurately in small sequences and presents a comparable accuracy rate in larger ones, while it requires a smaller execution time compared to well-known platforms. The Knotify+ source code and implementation details are available as a public repository on GitHub.
Collapse
Affiliation(s)
- Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Vrettos Moulos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| |
Collapse
|
14
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
15
|
Fei Y, Zhang H, Wang Y, Liu Z, Liu Y. LTPConstraint: a transfer learning based end-to-end method for RNA secondary structure prediction. BMC Bioinformatics 2022; 23:354. [PMID: 35999499 PMCID: PMC9396797 DOI: 10.1186/s12859-022-04847-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/18/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND RNA secondary structure is very important for deciphering cell's activity and disease occurrence. The first method which was used by the academics to predict this structure is biological experiment, But this method is too expensive, causing the promotion to be affected. Then, computing methods emerged, which has good efficiency and low cost. However, the accuracy of computing methods are not satisfactory. Many machine learning methods have also been applied to this area, but the accuracy has not improved significantly. Deep learning has matured and achieves great success in many areas such as computer vision and natural language processing. It uses neural network which is a kind of structure that has good functionality and versatility, but its effect is highly correlated with the quantity and quality of the data. At present, there is no model with high accuracy, low data dependence and high convenience in predicting RNA secondary structure. RESULTS This paper designs a neural network called LTPConstraint to predict RNA secondary structure. The network is based on many network structure such as Bidirectional LSTM, Transformer and generator. It also uses transfer learning to train modelso that the data dependence can be reduced. CONCLUSIONS LTPConstraint has achieved high accuracy in RNA secondary structure prediction. Compared with the previous methods, the accuracy improves obviously both in predicting the structure with pseudoknot and the structure without pseudoknot. At the same time, LTPConstraint is easy to operate and can achieve result very quickly.
Collapse
Affiliation(s)
- Yinchao Fei
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Yili Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Zhen Liu
- Graduate School of Engineering, Nagasaki Institute of Applied Science, Nagasaki, Japan
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China.
| |
Collapse
|
16
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
17
|
Andrikos C, Makris E, Kolaitis A, Rassias G, Pavlatos C, Tsanakas P. Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition. Methods Protoc 2022; 5:mps5010014. [PMID: 35200530 PMCID: PMC8876629 DOI: 10.3390/mps5010014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/27/2022] [Accepted: 01/30/2022] [Indexed: 11/16/2022] Open
Abstract
Obtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, also known as “RNA secondary structure prediction”. As COVID-19 is considered a severe global threat, the single-stranded SARS-CoV-2 virus reveals the importance of establishing an efficient RNA analysis toolkit. This work aimed to contribute to that by introducing a novel system committed to predicting RNA secondary structure patterns (i.e., RNA’s pseudoknots) that leverage syntactic pattern-recognition strategies. Having focused on the pseudoknot predictions, we formalized the secondary structure prediction of the RNA to be primarily a parsing and, secondly, an optimization problem. The proposed methodology addresses the problem of predicting pseudoknots of the first order (H-type). We introduce a context-free grammar (CFG) that affords enough expression power to recognize potential pseudoknot pattern. In addition, an alternative methodology of detecting possible pseudoknots is also implemented as well, using a brute-force algorithm. Any input sequence may highlight multiple potential folding patterns requiring a strict methodology to determine the single biologically realistic one. We conscripted a novel heuristic over the widely accepted notion of free-energy minimization to tackle such ambiguity in a performant way by utilizing each pattern’s context to unveil the most prominent pseudoknot pattern. The overall process features polynomial-time complexity, while its parallel implementation enhances the end performance, as proportional to the deployed hardware. The proposed methodology does succeed in predicting the core stems of any RNA pseudoknot of the test dataset by performing a 76.4% recall ratio. The methodology achieved a F1-score equal to 0.774 and MCC equal 0.543 in discovering all the stems of an RNA sequence, outperforming the particular task. Measurements were taken using a dataset of 262 RNA sequences establishing a performance speed of 1.31, 3.45, and 7.76 compared to three well-known platforms. The implementation source code is publicly available under knotify github repo.
Collapse
Affiliation(s)
- Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Georgios Rassias
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| |
Collapse
|
18
|
Winkler J, Urgese G, Ficarra E, Reinert K. LaRA 2: parallel and vectorized program for sequence-structure alignment of RNA sequences. BMC Bioinformatics 2022; 23:18. [PMID: 34991448 PMCID: PMC8734264 DOI: 10.1186/s12859-021-04532-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson-Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. RESULTS We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. CONCLUSIONS With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.
Collapse
Affiliation(s)
- Jörg Winkler
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Gianvito Urgese
- Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Elisa Ficarra
- Department of Control and Computer Science, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Knut Reinert
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| |
Collapse
|
19
|
Sato K, Kato Y. Prediction of RNA secondary structure including pseudoknots for long sequences. Brief Bioinform 2021; 23:6380459. [PMID: 34601552 PMCID: PMC8769711 DOI: 10.1093/bib/bbab395] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/13/2021] [Accepted: 08/30/2021] [Indexed: 12/28/2022] Open
Abstract
RNA structural elements called pseudoknots are involved in various biological phenomena including ribosomal frameshifts. Because it is infeasible to construct an efficiently computable secondary structure model including pseudoknots, secondary structure prediction methods considering pseudoknots are not yet widely available. We developed IPknot, which uses heuristics to speed up computations, but it has remained difficult to apply it to long sequences, such as messenger RNA and viral RNA, because it requires cubic computational time with respect to sequence length and has threshold parameters that need to be manually adjusted. Here, we propose an improvement of IPknot that enables calculation in linear time by employing the LinearPartition model and automatically selects the optimal threshold parameters based on the pseudo-expected accuracy. In addition, IPknot showed favorable prediction accuracy across a wide range of conditions in our exhaustive benchmarking, not only for single sequences but also for multiple alignments.
Collapse
Affiliation(s)
- Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Yuki Kato
- Department of RNA Biology and Neuroscience, Graduate School of Medicine, Osaka University, Suita, Osaka 565-0871, Japan.,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
20
|
Islam MR, Islam MS, Sakeef N. RNA Secondary Structure Prediction with Pseudoknots Using Chemical Reaction Optimization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1195-1207. [PMID: 31443047 DOI: 10.1109/tcbb.2019.2936570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA molecules play a significant role in cell function especially including pseudoknots. In past decades, several methods have been developed to predict RNA secondary structure with pseudoknots and the most popular one uses minimum free energy. It is a nondeterministic polynomial-time hard (NP-hard) problem. We have proposed an approach based on a metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem. The reaction operators of CRO algorithm have been redesigned and used on the generated population to find the structure with the minimum free energy. Besides, we have developed an additional operator called Repair operator which has a great influence on our algorithm in increasing accuracy. It helps to increase the true positive base pairs while decreasing the false positive and false negative base pairs. Four energy models have been applied to calculate the energy. To evaluate the performance, we have used four datasets containing RNA pseudoknotted sequences taken from the RNA STRAND and Pseudobase++ database. We have compared the proposed approach with some existing algorithms and shown that our CRO based model is a better prediction method in terms of accuracy and speed.
Collapse
|
21
|
Pinkney HR, Wright BM, Diermeier SD. The lncRNA Toolkit: Databases and In Silico Tools for lncRNA Analysis. Noncoding RNA 2020; 6:E49. [PMID: 33339309 PMCID: PMC7768357 DOI: 10.3390/ncrna6040049] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 12/14/2020] [Accepted: 12/15/2020] [Indexed: 02/07/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are a rapidly expanding field of research, with many new transcripts identified each year. However, only a small subset of lncRNAs has been characterized functionally thus far. To aid investigating the mechanisms of action by which new lncRNAs act, bioinformatic tools and databases are invaluable. Here, we review a selection of computational tools and databases for the in silico analysis of lncRNAs, including tissue-specific expression, protein coding potential, subcellular localization, structural conformation, and interaction partners. The assembled lncRNA toolkit is aimed primarily at experimental researchers as a useful starting point to guide wet-lab experiments, mainly containing multi-functional, user-friendly interfaces. With more and more new lncRNA analysis tools available, it will be essential to provide continuous updates and maintain the availability of key software in the future.
Collapse
Affiliation(s)
| | | | - Sarah D. Diermeier
- Department of Biochemistry, University of Otago, Dunedin 9016, New Zealand; (H.R.P.); (B.M.W.)
| |
Collapse
|
22
|
Simmonds P, Cuypers L, Irving WL, McLauchlan J, Cooke GS, Barnes E, Ansari MA. Impact of virus subtype and host IFNL4 genotype on large-scale RNA structure formation in the genome of hepatitis C virus. RNA (NEW YORK, N.Y.) 2020; 26:1541-1556. [PMID: 32747607 PMCID: PMC7566573 DOI: 10.1261/rna.075465.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 07/29/2020] [Indexed: 05/03/2023]
Abstract
Mechanisms underlying the ability of hepatitis C virus (HCV) to establish persistent infections and induce progressive liver disease remain poorly understood. HCV is one of several positive-stranded RNA viruses capable of establishing persistence in their immunocompetent vertebrate hosts, an attribute previously associated with formation of large-scale RNA structure in their genomic RNA. We developed novel methods to analyze and visualize genome-scale ordered RNA structure (GORS) predicted from the increasingly large data sets of complete genome sequences of HCV. Structurally conserved RNA secondary structure in coding regions of HCV localized exclusively to polyprotein ends (core, NS5B). Coding regions elsewhere were also intensely structured based on elevated minimum folding energy difference (MFED) values, but the actual stem-loop elements involved in genome folding were structurally poorly conserved, even between subtypes 1a and 1b. Dynamic remodeling was further evident from comparison of HCV strains in different host genetic backgrounds. Significantly higher MFED values, greater suppression of UpA dinucleotide frequencies, and restricted diversification were found in subjects with the TT genotype of the rs12979860 SNP in the IFNL4 gene compared to the CC (nonexpressing) allele. These structural and compositional associations with expression of interferon-λ4 were recapitulated on a larger scale by higher MFED values and greater UpA suppression of genotype 1 compared to genotype 3a, associated with previously reported HCV genotype-associated differences in hepatic interferon-stimulated gene induction. Associations between innate cellular responses with HCV structure and further evolutionary constraints represent an important new element in RNA virus evolution and the adaptive interplay between virus and host.
Collapse
Affiliation(s)
- Peter Simmonds
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, OX1 3SY, Oxford, United Kingdom
| | - Lize Cuypers
- University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Research, BE 3000, Leuven, Belgium
| | - Will L Irving
- Faculty of Medicine and Health Sciences, University of Nottingham and Nottingham University Hospitals NHS Trust, Nottingham, NG7 2UH, United Kingdom
| | - John McLauchlan
- MRC-University of Glasgow Centre for Virus Research, Glasgow, G61 1QH, United Kingdom
| | | | - Ellie Barnes
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, OX1 3SY, Oxford, United Kingdom
| | - M Azim Ansari
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, OX1 3SY, Oxford, United Kingdom
| |
Collapse
|
23
|
Chillón I, Marcia M. The molecular structure of long non-coding RNAs: emerging patterns and functional implications. Crit Rev Biochem Mol Biol 2020; 55:662-690. [PMID: 33043695 DOI: 10.1080/10409238.2020.1828259] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) are recently-discovered transcripts that regulate vital cellular processes and are crucially connected to diseases. Despite their unprecedented molecular complexity, it is emerging that lncRNAs possess distinct structural motifs. Remarkably, the 3D shape and topology of full-length, native lncRNAs have been visualized for the first time in the last year. These studies reveal that lncRNA structures dictate lncRNA functions. Here, we review experimentally determined lncRNA structures and emphasize that lncRNA structural characterization requires synergistic integration of computational, biochemical and biophysical approaches. Based on these emerging paradigms, we discuss how to overcome the challenges posed by the complex molecular architecture of lncRNAs, with the goal of obtaining a detailed understanding of lncRNA functions and molecular mechanisms in the future.
Collapse
Affiliation(s)
- Isabel Chillón
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| | - Marco Marcia
- European Molecular Biology Laboratory (EMBL) Grenoble, Grenoble, France
| |
Collapse
|
24
|
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019; 10:5407. [PMID: 31776342 PMCID: PMC6881452 DOI: 10.1038/s41467-019-13395-9] [Citation(s) in RCA: 181] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only [Formula: see text]250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of [Formula: see text]10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jack Hanson
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4222, Australia.
| |
Collapse
|