1
|
Omnes L, Angel E, Bartet P, Radvanyi F, Tahi F. A divide-and-conquer approach based on deep learning for long RNA secondary structure prediction: Focus on pseudoknots identification. PLoS One 2025; 20:e0314837. [PMID: 40279361 PMCID: PMC12026937 DOI: 10.1371/journal.pone.0314837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 03/04/2025] [Indexed: 04/27/2025] Open
Abstract
The accurate prediction of RNA secondary structure, and pseudoknots in particular, is of great importance in understanding the functions of RNAs since they give insights into their folding in three-dimensional space. However, existing approaches often face computational challenges or lack precision when dealing with long RNA sequences and/or pseudoknots. To address this, we propose a divide-and-conquer method based on deep learning, called DivideFold, for predicting the secondary structures including pseudoknots of long RNAs. Our approach is able to scale to long RNAs by recursively partitioning sequences into smaller fragments until they can be managed by an existing model able to predict RNA secondary structure including pseudoknots. We show that our approach exhibits superior performance compared to state-of-the-art methods for pseudoknot prediction and secondary structure prediction including pseudoknots for long RNAs. The source code of DivideFold, along with all the datasets used in this study, is accessible at https://evryrna.ibisc.univ-evry.fr/evryrna/dividefold/home.
Collapse
Affiliation(s)
- Loïc Omnes
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
- ADLIN Science, 91037 Evry-Courcouronnes, France
| | - Eric Angel
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | | | - François Radvanyi
- Molecular Oncology UMR144, CNRS - Institut Curie, 75005 Paris, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
2
|
Yang J, Sato K, Loza M, Park SJ, Nakai K. RNA secondary structure prediction by conducting multi-class classifications. Comput Struct Biotechnol J 2025; 27:1449-1459. [PMID: 40256169 PMCID: PMC12008525 DOI: 10.1016/j.csbj.2025.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 03/29/2025] [Accepted: 04/01/2025] [Indexed: 04/22/2025] Open
Abstract
Generating valid predictions of RNA secondary structures is challenging. Several deep learning methods have been developed for predicting RNA secondary structures. However, they commonly adopt post-processing steps to adjust the model output to produce valid predictions, which are complicated and could limit the performance. In this study, we propose a simple method by considering RNA secondary structure prediction as multiple multi-class classifications, which eliminates the need for those complicated post-processing steps. Then, we use this method to train and evaluate our model based on the attention mechanism and the convolutional neural network. Besides, we introduce two additional methods, including data augmentation to further improve the within-RNA-family performance and a method to alleviate the performance drop in the cross-RNA-family evaluation. In summary, we could produce valid predictions and achieve better performance without complex post-processing steps, and we show our additional methods are beneficial to the performance in within-RNA-family and cross-RNA-family evaluations.
Collapse
Affiliation(s)
- Jiyuan Yang
- Department of Computer Science, the Graduate School of Information Science and Technology, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
| | - Kengo Sato
- School of Life Science and Technology, Institute of Science Tokyo, 2-12-1-M6-12, Ookayama, Meguro-ku, 152-8550, Tokyo, Japan
| | - Martin Loza
- Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, 108-8639, Tokyo, Japan
| | - Sung-Joon Park
- Department of Computer Science, the Graduate School of Information Science and Technology, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
- Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, 108-8639, Tokyo, Japan
| | - Kenta Nakai
- Department of Computer Science, the Graduate School of Information Science and Technology, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
- Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, 108-8639, Tokyo, Japan
| |
Collapse
|
3
|
La Rosa M, Fiannaca A, Mendolia I, La Paglia L, Urso A. GL4SDA: Predicting snoRNA-disease associations using GNNs and LLM embeddings. Comput Struct Biotechnol J 2025; 27:1023-1033. [PMID: 40160859 PMCID: PMC11952811 DOI: 10.1016/j.csbj.2025.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 03/04/2025] [Accepted: 03/08/2025] [Indexed: 04/02/2025] Open
Abstract
Small nucleolar RNAs (snoRNAs) play essential roles in various cellular processes, and their associations with diseases are increasingly recognized. Identifying these snoRNA-disease relationships is critical for advancing our understanding of their functional roles and potential therapeutic implications. This work presents a novel approach, called GL4SDA, to predict snoRNA-disease associations using Graph Neural Networks (GNN) and Large Language Models. Our methodology leverages the unique strengths of heterogeneous graph structures to model complex biological interactions. Differently from existing methods, we define a set of features able to capture deeper information content related to the inner attributes of both snoRNAs and diseases and design a GNN model based on highly performing layers, which can maximize results on this representation. We consider snoRNA secondary structures and disease embeddings derived from large language models to obtain snoRNAs and disease node features, respectively. By combining structural features of snoRNAs with rich semantic embeddings of diseases, we construct a feature-rich graph representation that improves the predictive performance of our model. We evaluate our approach using different architectures that exploit the capabilities of many graph convolutional layers and compare the results with three other state-of-the-art graph-based predictors. GL4SDA demonstrates improved scores in link prediction tasks and demonstrates its potential implication as a tool for exploring snoRNA-disease relationships. We also validate our findings through biological case studies about cancer diseases, highlighting the practical application of our method in real-world scenarios and obtaining the most important snoRNA features using explainable artificial intelligence methods.
Collapse
Affiliation(s)
| | | | - Isabella Mendolia
- CNR-ICAR, National Research Council of Italy, via Ugo La Malfa 153, Palermo, 90146, Italy
| | | | | |
Collapse
|
4
|
Kagaya Y, Zhang Z, Ibtehaz N, Wang X, Nakamura T, Punuru PD, Kihara D. NuFold: end-to-end approach for RNA tertiary structure prediction with flexible nucleobase center representation. Nat Commun 2025; 16:881. [PMID: 39837861 PMCID: PMC11751094 DOI: 10.1038/s41467-025-56261-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 01/13/2025] [Indexed: 01/23/2025] Open
Abstract
RNA plays a crucial role not only in information transfer as messenger RNA during gene expression but also in various biological functions as non-coding RNAs. Understanding mechanical mechanisms of function needs tertiary structure information; however, experimental determination of three-dimensional RNA structures is costly and time-consuming, leading to a substantial gap between RNA sequence and structural data. To address this challenge, we developed NuFold, a novel computational approach that leverages state-of-the-art deep learning architecture to accurately predict RNA tertiary structures. NuFold is a deep neural network trained end-to-end for the output structure from the input sequence. NuFold incorporates a nucleobase center representation, which enables flexible conformation of ribose rings. Benchmark study showed that NuFold clearly outperformed energy-based methods and demonstrated comparable results with existing state-of-the-art deep-learning-based methods. NuFold exhibited a particular advantage in building correct local geometries of RNA. Analyses of individual components in the NuFold pipeline indicated that the performance improved by utilizing metagenome sequences for multiple sequence alignment and increasing the number of recycling. NuFold is also capable of predicting multimer complex structures of RNA by linking the input sequences.
Collapse
Affiliation(s)
- Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Tsukasa Nakamura
- Department of Biological Sciences, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Pranav Deep Punuru
- Department of Biological Sciences, Purdue University, West Lafayette, 47907, Indiana, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, 47907, Indiana, USA.
- Department of Computer Science, Purdue University, West Lafayette, 47907, Indiana, USA.
| |
Collapse
|
5
|
Maghraby A, Alzalaty M. Genome-wide identification, characterization, and functional analysis of the CHX, SOS, and RLK genes in Solanum lycopersicum under salt stress. Sci Rep 2025; 15:1142. [PMID: 39774029 PMCID: PMC11707246 DOI: 10.1038/s41598-024-83221-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 12/12/2024] [Indexed: 01/11/2025] Open
Abstract
The cation/proton exchanger (CHX), salt overly sensitive (SOS), and receptor-like kinase (RLK) genes play significant roles in the response to salt stress in plants. This study is the first to identify the SOS gene in Solanum lycopersicum (tomato) through genome-wide analysis under salt stress conditions. Quantitative reverse transcription PCR (qRT-PCR) results indicated that the expression levels of CHX, SOS, and RLK genes were upregulated, with fold changes of 1.83, 1.49, and 1.55, respectively, after 12 h of exposure to salt stress. Genome-wide analysis revealed 21 CHX, 5 SOS, and 86 RLK genes in S. lycopersicum. CHX genes were found on chromosomes 2, 3, 4, 5, 6, 7, 8, 9, 11, and 12 of S. lycopersicum. SOS genes were found on chromosomes 1, 4, 6, and 10. RLK genes were found on all chromosomes of S. lycopersicum. The Ka/Ks ratios indicate that the CHX, SOS, and RLK genes have been primarily influenced by purifying selection. This suggests that these genes have faced strong environmental pressures throughout their evolution. Purifying selection typically results in a decrease in genetic diversity. The estimated duplication time for CHX paralogous gene pairs ranged from approximately 26.965 to 245.413 million years ago (Mya), while the duplication time for SOS paralogous gene pairs ranged from around 116.682 to 275.631 Mya. For RLK paralogous gene pairs, the duplication time varied from approximately 27.689 to 239.376 Mya. Synteny analysis of the CHX, SOS, and RLK genes demonstrated collinear relationships with orthologous genes in Arabidopsis thaliana, but no collinearity orthologous relationships in Oryza sativa (rice). Furthermore, the analysis revealed that there were 6 orthologous SlCHX genes, 2 orthologous SlSOS genes, and 44 orthologous SlRLK genes paired with those in A. thaliana. The results of the present study may help to elucidate the role of the CHX, SOS, and RLK genes in salt stress in S. lycopersicum.
Collapse
Affiliation(s)
- Amaal Maghraby
- Botany and Microbiology Department, Faculty of Science, Cairo University, Cairo, Egypt.
| | - Mohamed Alzalaty
- Department of Plant Genetic Transformation, Agricultural Genetic Engineering Research Institute (AGERI), Agricultural Research Center (ARC), Cairo, Egypt
| |
Collapse
|
6
|
Oleynikov M, Jaffrey SR. RNA tertiary structure and conformational dynamics revealed by BASH MaP. eLife 2024; 13:RP98540. [PMID: 39625751 PMCID: PMC11614387 DOI: 10.7554/elife.98540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2024] Open
Abstract
The functional effects of an RNA can arise from complex three-dimensional folds known as tertiary structures. However, predicting the tertiary structure of an RNA and whether an RNA adopts distinct tertiary conformations remains challenging. To address this, we developed BASH MaP, a single-molecule dimethyl sulfate (DMS) footprinting method and DAGGER, a computational pipeline, to identify alternative tertiary structures adopted by different molecules of RNA. BASH MaP utilizes potassium borohydride to reveal the chemical accessibility of the N7 position of guanosine, a key mediator of tertiary structures. We used BASH MaP to identify diverse conformational states and dynamics of RNA G-quadruplexes, an important RNA tertiary motif, in vitro and in cells. BASH MaP and DAGGER analysis of the fluorogenic aptamer Spinach reveals that it adopts alternative tertiary conformations which determine its fluorescence states. BASH MaP thus provides an approach for structural analysis of RNA by revealing previously undetectable tertiary structures.
Collapse
Affiliation(s)
- Maxim Oleynikov
- Department of Pharmacology, Weill Medical College, Cornell UniversityNew YorkUnited States
| | - Samie R Jaffrey
- Department of Pharmacology, Weill Medical College, Cornell UniversityNew YorkUnited States
| |
Collapse
|
7
|
Boon WX, Sia BZ, Ng CH. Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes of early pandemic phase. F1000Res 2024; 10:1053. [PMID: 39268187 PMCID: PMC11391198 DOI: 10.12688/f1000research.72896.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/11/2024] [Indexed: 09/15/2024] Open
Abstract
Background The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) had led to a global pandemic since December 2019. SARS-CoV-2 is a single-stranded RNA virus, which mutates at a higher rate. Multiple works had been done to study nonsynonymous mutations, which change protein sequences. However, there is little study on the effects of SARS-CoV-2 synonymous mutations, which may affect viral fitness. This study aims to predict the effect of synonymous mutations on the SARS-CoV-2 genome. Methods A total of 26645 SARS-CoV-2 genomic sequences retrieved from Global Initiative on Sharing all Influenza Data (GISAID) database were aligned using MAFFT. Then, the mutations and their respective frequency were identified. Multiple RNA secondary structures prediction tools, namely RNAfold, IPknot++ and MXfold2 were applied to predict the effect of the mutations on RNA secondary structure and their base pair probabilities was estimated using MutaRNA. Relative synonymous codon usage (RSCU) analysis was also performed to measure the codon usage bias (CUB) of SARS-CoV-2. Results A total of 150 synonymous mutations were identified. The synonymous mutation identified with the highest frequency is C3037U mutation in the nsp3 of ORF1a. Of these top 10 highest frequency synonymous mutations, C913U, C3037U, U16176C and C18877U mutants show pronounced changes between wild type and mutant in all 3 RNA secondary structure prediction tools, suggesting these mutations may have some biological impact on viral fitness. These four mutations show changes in base pair probabilities. All mutations except U16176C change the codon to a more preferred codon, which may result in higher translation efficiency. Conclusion Synonymous mutations in SARS-CoV-2 genome may affect RNA secondary structure, changing base pair probabilities and possibly resulting in a higher translation rate. However, lab experiments are required to validate the results obtained from prediction analysis.
Collapse
Affiliation(s)
- Wan Xin Boon
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Boon Zhan Sia
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Chong Han Ng
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| |
Collapse
|
8
|
Fallah A, Havaei SA, Sedighian H, Kachuei R, Fooladi AAI. Prediction of aptamer affinity using an artificial intelligence approach. J Mater Chem B 2024; 12:8825-8842. [PMID: 39158322 DOI: 10.1039/d4tb00909f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
Aptamers are oligonucleotide sequences that can connect to particular target molecules, similar to monoclonal antibodies. They can be chosen by systematic evolution of ligands by exponential enrichment (SELEX), and are modifiable and can be synthesized. Even if the SELEX approach has been improved a lot, it is frequently challenging and time-consuming to identify aptamers experimentally. In particular, structure-based methods are the most used in computer-aided design and development of aptamers. For this purpose, numerous web-based platforms have been suggested for the purpose of forecasting the secondary structure and 3D configurations of RNAs and DNAs. Also, molecular docking and molecular dynamics (MD), which are commonly utilized in protein compound selection by structural information, are suitable for aptamer selection. On the other hand, from a large number of sequences, artificial intelligence (AI) may be able to quickly discover the possible aptamer candidates. Conversely, sophisticated machine and deep-learning (DL) models have demonstrated efficacy in forecasting the binding properties between ligands and targets during drug discovery; as such, they may provide a reliable and precise method for forecasting the binding of aptamers to targets. This research looks at advancements in AI pipelines and strategies for aptamer binding ability prediction, such as machine and deep learning, as well as structure-based approaches, molecular dynamics and molecular docking simulation methods.
Collapse
Affiliation(s)
- Arezoo Fallah
- Department of Bacteriology and Virology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Seyed Asghar Havaei
- Department of Microbiology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Hamid Sedighian
- Applied Microbiology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| | - Reza Kachuei
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Abbas Ali Imani Fooladi
- Applied Microbiology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
9
|
Qi F, Chen J, Chen Y, Sun J, Lin Y, Chen Z, Kapranov P. Evaluating Performance of Different RNA Secondary Structure Prediction Programs Using Self-cleaving Ribozymes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae043. [PMID: 39317944 PMCID: PMC12016570 DOI: 10.1093/gpbjnl/qzae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/02/2024] [Accepted: 06/05/2024] [Indexed: 09/26/2024]
Abstract
Accurate identification of the correct, biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes. Thus, a plethora of approaches have been developed to predict, identify, or solve RNA structures based on various computational, molecular, genetic, chemical, or physicochemical strategies. Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation, time, speed, cost, and throughput, but they strongly underperform in terms of accuracy that significantly limits their broader application. Nonetheless, the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs. Here, we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity. We found that while many programs performed well in relatively simple tasks, their performance varied significantly in more complex RNA folding problems. However, in general, a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures, at least based on the specific class of sequences tested, suggesting that it may represent the future of RNA structure prediction algorithms.
Collapse
Affiliation(s)
- Fei Qi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Junjie Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Yue Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, United Kingdom
| | - Yiting Lin
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Zipeng Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
10
|
Oleynikov M, Jaffrey SR. RNA tertiary structure and conformational dynamics revealed by BASH MaP. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589009. [PMID: 38645201 PMCID: PMC11030352 DOI: 10.1101/2024.04.11.589009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The functional effects of an RNA can arise from complex three-dimensional folds known as tertiary structures. However, predicting the tertiary structure of an RNA and whether an RNA adopts distinct tertiary conformations remains challenging. To address this, we developed BASH MaP, a single-molecule dimethyl sulfate (DMS) footprinting method and DAGGER, a computational pipeline, to identify alternative tertiary structures adopted by different molecules of RNA. BASH MaP utilizes potassium borohydride to reveal the chemical accessibility of the N7 position of guanosine, a key mediator of tertiary structures. We used BASH MaP to identify diverse conformational states and dynamics of RNA G-quadruplexes, an important RNA tertiary motif, in vitro and in cells. BASH MaP and DAGGER analysis of the fluorogenic aptamer Spinach reveals that it adopts alternative tertiary conformations which determine its fluorescence states. BASH MaP thus provides an approach for structural analysis of RNA by revealing previously undetectable tertiary structures.
Collapse
Affiliation(s)
- Maxim Oleynikov
- Department of Pharmacology, Weill Medical College, Cornell University, New York, NY, USA
| | - Samie R. Jaffrey
- Department of Pharmacology, Weill Medical College, Cornell University, New York, NY, USA
| |
Collapse
|
11
|
Huang X, Du Z. Possible involvement of three-stemmed pseudoknots in regulating translational initiation in human mRNAs. PLoS One 2024; 19:e0307541. [PMID: 39038036 PMCID: PMC11262651 DOI: 10.1371/journal.pone.0307541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
RNA pseudoknots play a crucial role in various cellular functions. Established pseudoknots show significant variation in both size and structural complexity. Specifically, three-stemmed pseudoknots are characterized by an additional stem-loop embedded in their structure. Recent findings highlight these pseudoknots as bacterial riboswitches and potent stimulators for programmed ribosomal frameshifting in RNA viruses like SARS-CoV2. To investigate the possible presence of functional three-stemmed pseudoknots in human mRNAs, we employed in-house developed computational methods to detect such structures within a dataset comprising 21,780 full-length human mRNA sequences. Numerous three-stemmed pseudoknots were identified. A selected set of 14 potential instances are presented, in which the start codon of the mRNA is found in close proximity either upstream, downstream, or within the identified three-stemmed pseudoknot. These pseudoknots likely play a role in translational initiation regulation. The probability of their existence gains support from their ranking as the most stable pseudoknot identified in the entire mRNA sequence, structural conservation across homologous mRNAs, stereochemical feasibility as demonstrated by structural modeling, and classification as members of the CPK-1 pseudoknot family, which includes many well-established pseudoknots. Furthermore, in four of the mRNAs, two or three closely spaced or tandem three-stemmed pseudoknots were identified. These findings suggest the frequent occurrence of three-stemmed pseudoknots in human mRNAs. A stepwise co-transcriptional folding mechanism is proposed for the formation of a three-stemmed pseudoknot structure. Our results not only provide fresh insights into the structures and functions of pseudoknots but also unveil the potential to target pseudoknots for treating human diseases.
Collapse
Affiliation(s)
- Xiaolan Huang
- School of Computing, Southern Illinois University at Carbondale, IL, United States of America
| | - Zhihua Du
- School of Chemical and Biomolecular Sciences, Southern Illinois University at Carbondale, IL, United States of America
| |
Collapse
|
12
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
13
|
Maghraby A, Alzalaty M. Genome-wide identification and evolutionary analysis of the AP2/EREBP, COX and LTP genes in Zea mays L. under drought stress. Sci Rep 2024; 14:7610. [PMID: 38556556 PMCID: PMC10982304 DOI: 10.1038/s41598-024-57376-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 03/18/2024] [Indexed: 04/02/2024] Open
Abstract
AP2 (APETALA2)/EREBP (ethylene-responsive element-binding protein), cytochrome c oxidase (COX) and nonspecific lipid transfer proteins (LTP) play important roles in the response to drought stress. This is the first study to identify the COX gene in Zea mays L. via genome-wide analysis. The qRT‒PCR results indicated that AP2/EREBP, COX and LTP were downregulated, with fold changes of 0.84, 0.53 and 0.31, respectively, after 12 h of drought stress. Genome-wide analysis identified 78 AP2/EREBP, 6 COX and 10 LTP genes in Z. mays L. Domain analysis confirmed the presence of the AP2 domain, Cyt_c_Oxidase_Vb domain and nsLTP1 in the AP2/EREBP, COX and LTP proteins, respectively. The AP2/EREBP protein family (AP2) includes five different domain types: the AP2/ERF domain, the EREBP-like factor (EREBP), the ethylene responsive factor (ERF), the dehydration responsive element binding protein (DREB) and the SHN SHINE. Synteny analysis of the AP2/EREBP, COX and LTP genes revealed collinearity orthologous relationships in O. sativa, H. vulgare and A. thaliana. AP2/EREBP genes were found on the 10 chromosomes of Z. mays L. COX genes were found on chromosomes 1, 3, 4, 5, 7 and 8. LTP genes were found on chromosomes 1, 3, 6, 8, 9 and 10. In the present study, the Ka/Ks ratios of the AP2/EREBP paralogous pairs indicated that the AP2/EREBP genes were influenced primarily by purifying selection, which indicated that the AP2/EREBP genes received strong environmental pressure during evolution. The Ka/Ks ratios of the COX-3/COX-4 paralogous pairs indicate that the COX-3/COX-4 genes were influenced primarily by Darwinian selection (driving change). For the LTP genes, the Ka/Ks ratios of the LTP-1/LTP-10, LTP-5/LTP-3 and LTP-4/LTP-8 paralogous pairs indicate that these genes were influenced primarily by purifying selection, while the Ka/Ks ratios of the LTP-2/LTP-6 paralogous pairs indicate that these genes were influenced primarily by Darwinian selection. The duplication time of the AP2/EREBP paralogous gene pairs in Z. mays L. ranged from approximately 9.364 to 100.935 Mya. The duplication time of the COX-3/COX-4 paralogous gene pair was approximately 5.217 Mya. The duplication time of the LTP paralogous gene pairs ranged from approximately 19.064 to 96.477 Mya. The major focus of research is to identify the genes that are responsible for drought stress tolerance to improve maize for drought stress tolerance. The results of the present study will improve the understanding of the functions of the AP2/EREBP, COX and LTP genes in response to drought stress.
Collapse
Affiliation(s)
- Amaal Maghraby
- Botany and Microbiology Department, Faculty of Science, Cairo University, Giza, Egypt.
| | - Mohamed Alzalaty
- Department of Plant Genetic Transformation, Agricultural Genetic Engineering Research Institute (AGERI), Agricultural Research Center (ARC), Giza, Egypt
| |
Collapse
|
14
|
Gong T, Ju F, Bu D. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials. Commun Biol 2024; 7:297. [PMID: 38461362 PMCID: PMC10924946 DOI: 10.1038/s42003-024-05952-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/11/2024] Open
Abstract
Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.
Collapse
Affiliation(s)
- Tiansu Gong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
- University of Chinese Academy of Sciences, 100190, Beijing, China.
- Central China Artificial Intelligence Research Institute, Henan Academy of Sciences, Zhengzhou, 450046, Henan, China.
| |
Collapse
|
15
|
Loyer G, Reinharz V. Concurrent prediction of RNA secondary structures with pseudoknots and local 3D motifs in an integer programming framework. Bioinformatics 2024; 40:btae022. [PMID: 38230755 PMCID: PMC10868335 DOI: 10.1093/bioinformatics/btae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/30/2023] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. RESULTS We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. AVAILABILITY AND IMPLEMENTATION The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.
Collapse
Affiliation(s)
- Gabriel Loyer
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada
| |
Collapse
|
16
|
Rocca R, Grillone K, Citriniti EL, Gualtieri G, Artese A, Tagliaferri P, Tassone P, Alcaro S. Targeting non-coding RNAs: Perspectives and challenges of in-silico approaches. Eur J Med Chem 2023; 261:115850. [PMID: 37839343 DOI: 10.1016/j.ejmech.2023.115850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/08/2023] [Accepted: 09/29/2023] [Indexed: 10/17/2023]
Abstract
The growing information currently available on the central role of non-coding RNAs (ncRNAs) including microRNAs (miRNAS) and long non-coding RNAs (lncRNAs) for chronic and degenerative human diseases makes them attractive therapeutic targets. RNAs carry out different functional roles in human biology and are deeply deregulated in several diseases. So far, different attempts to therapeutically target the 3D RNA structures with small molecules have been reported. In this scenario, the development of computational tools suitable for describing RNA structures and their potential interactions with small molecules is gaining more and more interest. Here, we describe the most suitable strategies to study ncRNAs through computational tools. We focus on methods capable of predicting 2D and 3D ncRNA structures. Furthermore, we describe computational tools to identify, design and optimize small molecule ncRNA binders. This review aims to outline the state of the art and perspectives of computational methods for ncRNAs over the past decade.
Collapse
Affiliation(s)
- Roberta Rocca
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| | - Katia Grillone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | | | | | - Anna Artese
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy.
| | | | - Pierfrancesco Tassone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | - Stefano Alcaro
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| |
Collapse
|
17
|
Ballarino M, Pepe G, Helmer-Citterich M, Palma A. Exploring the landscape of tools and resources for the analysis of long non-coding RNAs. Comput Struct Biotechnol J 2023; 21:4706-4716. [PMID: 37841333 PMCID: PMC10568309 DOI: 10.1016/j.csbj.2023.09.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/28/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
In recent years, research on long non-coding RNAs (lncRNAs) has gained considerable attention due to the increasing number of newly identified transcripts. Several characteristics make their functional evaluation challenging, which called for the urgent need to combine molecular biology with other disciplines, including bioinformatics. Indeed, the recent development of computational pipelines and resources has greatly facilitated both the discovery and the mechanisms of action of lncRNAs. In this review, we present a curated collection of the most recent computational resources, which have been categorized into distinct groups: databases and annotation, identification and classification, interaction prediction, and structure prediction. As the repertoire of lncRNAs and their analysis tools continues to expand over the years, standardizing the computational pipelines and improving the existing annotation of lncRNAs will be crucial to facilitate functional genomics studies.
Collapse
Affiliation(s)
- Monica Ballarino
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Alessandro Palma
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| |
Collapse
|
18
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
19
|
Lin BC, Katneni U, Jankowska KI, Meyer D, Kimchi-Sarfaty C. In silico methods for predicting functional synonymous variants. Genome Biol 2023; 24:126. [PMID: 37217943 PMCID: PMC10204308 DOI: 10.1186/s13059-023-02966-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open
Abstract
Single nucleotide variants (SNVs) contribute to human genomic diversity. Synonymous SNVs are previously considered to be "silent," but mounting evidence has revealed that these variants can cause RNA and protein changes and are implicated in over 85 human diseases and cancers. Recent improvements in computational platforms have led to the development of numerous machine-learning tools, which can be used to advance synonymous SNV research. In this review, we discuss tools that should be used to investigate synonymous variants. We provide supportive examples from seminal studies that demonstrate how these tools have driven new discoveries of functional synonymous SNVs.
Collapse
Affiliation(s)
- Brian C Lin
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Katarzyna I Jankowska
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, US FDA, Silver Spring, MD, USA.
| |
Collapse
|
20
|
Makris E, Kolaitis A, Andrikos C, Moulos V, Tsanakas P, Pavlatos C. Knotify+: Toward the Prediction of RNA H-Type Pseudoknots, Including Bulges and Internal Loops. Biomolecules 2023; 13:biom13020308. [PMID: 36830677 PMCID: PMC9953189 DOI: 10.3390/biom13020308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/25/2023] [Accepted: 02/01/2023] [Indexed: 02/09/2023] Open
Abstract
The accurate "base pairing" in RNA molecules, which leads to the prediction of RNA secondary structures, is crucial in order to explain unknown biological operations. Recently, COVID-19, a widespread disease, has caused many deaths, affecting humanity in an unprecedented way. SARS-CoV-2, a single-stranded RNA virus, has shown the significance of analyzing these molecules and their structures. This paper aims to create a pioneering framework in the direction of predicting specific RNA structures, leveraging syntactic pattern recognition. The proposed framework, Knotify+, addresses the problem of predicting H-type pseudoknots, including bulges and internal loops, by featuring the power of context-free grammar (CFG). We combine the grammar's advantages with maximum base pairing and minimum free energy to tackle this ambiguous task in a performant way. Specifically, our proposed methodology, Knotify+, outperforms state-of-the-art frameworks with regards to its accuracy in core stems prediction. Additionally, it performs more accurately in small sequences and presents a comparable accuracy rate in larger ones, while it requires a smaller execution time compared to well-known platforms. The Knotify+ source code and implementation details are available as a public repository on GitHub.
Collapse
Affiliation(s)
- Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Vrettos Moulos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| |
Collapse
|
21
|
Fukunaga T, Hamada M. LinAliFold and CentroidLinAliFold: fast RNA consensus secondary structure prediction for aligned sequences using beam search methods. BIOINFORMATICS ADVANCES 2022; 2:vbac078. [PMID: 36699418 PMCID: PMC9710674 DOI: 10.1093/bioadv/vbac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/13/2022] [Accepted: 10/21/2022] [Indexed: 11/05/2022]
Abstract
Motivation RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time. Results In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results. Availability and implementation The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo 1690051, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 1698555, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo 1698555, Japan
| |
Collapse
|
22
|
Bugnon LA, Edera AA, Prochetto S, Gerard M, Raad J, Fenoy E, Rubiolo M, Chorostecki U, Gabaldón T, Ariel F, Di Persia LE, Milone DH, Stegmayer G. Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches. Brief Bioinform 2022; 23:6606044. [PMID: 35692094 DOI: 10.1093/bib/bbac205] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In contrast to messenger RNAs, the function of the wide range of existing long noncoding RNAs (lncRNAs) largely depends on their structure, which determines interactions with partner molecules. Thus, the determination or prediction of the secondary structure of lncRNAs is critical to uncover their function. Classical approaches for predicting RNA secondary structure have been based on dynamic programming and thermodynamic calculations. In the last 4 years, a growing number of machine learning (ML)-based models, including deep learning (DL), have achieved breakthrough performance in structure prediction of biomolecules such as proteins and have outperformed classical methods in short transcripts folding. Nevertheless, the accurate prediction for lncRNA still remains far from being effectively solved. Notably, the myriad of new proposals has not been systematically and experimentally evaluated. RESULTS In this work, we compare the performance of the classical methods as well as the most recently proposed approaches for secondary structure prediction of RNA sequences using a unified and consistent experimental setup. We use the publicly available structural profiles for 3023 yeast RNA sequences, and a novel benchmark of well-characterized lncRNA structures from different species. Moreover, we propose a novel metric to assess the predictive performance of methods, exclusively based on the chemical probing data commonly used for profiling RNA structures, avoiding any potential bias incorporated by computational predictions when using dot-bracket references. Our results provide a comprehensive comparative assessment of existing methodologies, and a novel and public benchmark resource to aid in the development and comparison of future approaches. AVAILABILITY Full source code and benchmark datasets are available at: https://github.com/sinc-lab/lncRNA-folding. CONTACT lbugnon@sinc.unl.edu.ar.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - A A Edera
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - S Prochetto
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina.,IAL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina
| | - M Gerard
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - E Fenoy
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - M Rubiolo
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - U Chorostecki
- Barcelona Supercomputing Center (BSC-CNS), Institute of Research in Biomedicine (IRB), Spain
| | - T Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), Institute of Research in Biomedicine (IRB), Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.,Centro de Investigación Biomédica En Red de Enfermedades Infecciosas (CIBERINFEC), Barcelona, Spain
| | - F Ariel
- IAL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina
| | - L E Di Persia
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
23
|
Moudgal N, Arhin G, Frank AT. Using Unassigned NMR Chemical Shifts to Model RNA Secondary Structure. J Phys Chem A 2022; 126:2739-2745. [PMID: 35470661 DOI: 10.1021/acs.jpca.2c00456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
NMR-derived chemical shifts are sensitive probes of RNA structure. However, the need to assign NMR spectra hampers their utility as a direct source of structural information. In this report, we describe a simple method that uses unassigned 2D NMR spectra to model the secondary structure of RNAs. As in the case of assigned chemical shifts, we could use unassigned chemical shift data to reweight conformational libraries such that the highest weighted structure closely resembles their reference NMR structure. Furthermore, the application of our approach to the 3'- and 5'-UTR of the SARS-CoV-2 genome yields structures that are, for the most part, consistent with the secondary structure models derived from chemical probing data. Therefore, we expect the framework we describe here will be useful as a general strategy for rapidly generating preliminary structural RNA models directly from unassigned 2D NMR spectra. As we demonstrated for the 337-nt and 472-nt UTRs of SARS-CoV-2, our approach could be especially valuable for modeling the secondary structures of large RNA.
Collapse
Affiliation(s)
- Neel Moudgal
- Saline High School, 1300 Campus Pkwy, Saline, Michigan 48176, United States
| | - Grace Arhin
- Biophysics Program, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | - Aaron T Frank
- Biophysics Program, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States.,Chemistry Department, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
24
|
Computer-aided comprehensive explorations of RNA structural polymorphism through complementary simulation methods. QRB DISCOVERY 2022. [PMID: 37529277 PMCID: PMC10392686 DOI: 10.1017/qrd.2022.19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
While RNA folding was originally seen as a simple problem to solve, it has been shown that the promiscuous interactions of the nucleobases result in structural polymorphism, with several competing structures generally observed for non-coding RNA. This inherent complexity limits our understanding of these molecules from experiments alone, and computational methods are commonly used to study RNA. Here, we discuss three advanced sampling schemes, namely Hamiltonian-replica exchange molecular dynamics (MD), ratchet-and-pawl MD and discrete path sampling, as well as the HiRE-RNA coarse-graining scheme, and highlight how these approaches are complementary with reference to recent case studies. While all computational methods have their shortcomings, the plurality of simulation methods leads to a better understanding of experimental findings and can inform and guide experimental work on RNA polymorphism.
Collapse
|