1
|
Yang TH. DEBFold: Computational Identification of RNA Secondary Structures for Sequences across Structural Families Using Deep Learning. J Chem Inf Model 2024; 64:3756-3766. [PMID: 38648189 PMCID: PMC11094721 DOI: 10.1021/acs.jcim.4c00458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/09/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024]
Abstract
It is now known that RNAs play more active roles in cellular pathways beyond simply serving as transcription templates. These biological mechanisms might be mediated by higher RNA stereo conformations, triggering the need to understand RNA secondary structures first. However, experimental protocols for solving RNA structures are unavailable for large-scale investigation due to their high costs and time-consuming nature. Various computational tools were thus developed to predict the RNA secondary structures from sequences. Recently, deep networks have been investigated to help predict RNA structures directly from their sequences. However, existing deep-learning-based tools are more or less suffering from model overfitting due to their complicated problem formulation and defective model training processes, limiting their applications across sequences from different structural families. In this research, we designed a two-stage RNA structure prediction strategy called DEBFold (deep ensemble boosting and folding) based on convolution encoding/decoding and self-attention mechanisms to enhance the existing thermodynamic structure models. Moreover, the model training process followed rigorous steps to achieve an acceptable prediction generalization. On the family-wise reserved test sets and the PDB-derived test set, DEBFold achieves better structure prediction performance over traditional tools and existing deep-learning methods. In summary, we obtained a cutting-edge deep-learning-based structure prediction tool with supreme across-family generalization performance. The DEBFold tool can be accessed at https://cobis.bme.ncku.edu.tw/DEBFold/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department
of Biomedical Engineering, National Cheng
Kung University, No.1, University Road, Tainan 701, Taiwan
- Medical
Device Innovation Center, National Cheng
Kung University, No.1,
University Road, Tainan 701, Taiwan
| |
Collapse
|
2
|
Shen C, Chen Y, Xiao F, Yang T, Wang X, Chen S, Tang J, Liao Z. BAT-Net: An enhanced RNA Secondary Structure prediction via bidirectional GRU-based network with attention mechanism. Comput Biol Chem 2022; 101:107765. [DOI: 10.1016/j.compbiolchem.2022.107765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 08/24/2022] [Indexed: 11/03/2022]
|
3
|
Yang TH, Lin YC, Hsia M, Liao ZY. SSRTool: a web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability. Comput Struct Biotechnol J 2022; 20:2473-2483. [PMID: 35664227 PMCID: PMC9136272 DOI: 10.1016/j.csbj.2022.05.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 05/13/2022] [Accepted: 05/13/2022] [Indexed: 01/02/2023] Open
Abstract
RNA secondary structures can carry out essential cellular functions alone or interact with one another to form the hierarchical tertiary structures. Experimental structure identification approa ches can show the in vitro structures of RNA molecules. However, they usually have limits in the resolution and are costly. In silico structure prediction tools are thus primarily relied on for pre-experiment analysis. Various structure prediction models have been developed over the decades. Since these tools are usually used before knowing the actual RNA structures, evaluating and ranking the pile of secondary structure predictions of a given sequence is essential in computational analysis. In this research, we implemented a web service called SSRTool (RNA Secondary Structure prediction Ranking Tool) to assist in the ranking and evaluation of the generated predicted structures of a given sequence. Based on the computed species-specific interpretability significance in four common RNA structure–function aspects, SSRTool provides three functions along with visualization interfaces: (1) Rank user-generated predictions. (2) Provide an automated streamline of structure prediction and ranking for a given sequence. (3) Infer the functional aspects of a given structure. We demonstrated the applicability of SSRTool via real case studies and reported the similar trends between computed species-specific rankings and the corresponding prediction F1 values. The SSRTool web service is available online at https://cobisHSS0.im.nuk.edu.tw/SSRTool/, http://cosbi3.ee.ncku.edu.tw/SSRTool/, or the redirecting site https://github.com/cobisLab/SSRTool/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
- Corresponding author.
| | - Yu-Cian Lin
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Min Hsia
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Zhan-Yi Liao
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
4
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
5
|
Yang TH. An Aggregation Method to Identify the RNA Meta-Stable Secondary Structure and its Functionally Interpretable Structure Ensemble. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:75-86. [PMID: 34014829 DOI: 10.1109/tcbb.2021.3082396] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA can provide vital cellular functions through its secondary or tertiary structure. Due to the low-throughput nature of experimental approaches, studies on RNA structures mainly resort to computational methods. However, current existing tools fail to consider RNA structure ensembles and do not provide ways to decipher functional hypotheses for the new predictions. In this research, a novel method was proposed to identify the functionally interpretable structure ensemble of a given RNA sequence and provide the meta-stable structure, or the most frequently observed functional RNA cellular conformation, based on the ensemble. In the prediction of meta-stable structures, the proposed method outperformed existing tools on a yeast test set. The inferred functional aspects were then manually checked and demonstrated a micro-averaging F1 value of 0.92. Further, a biological example of the yeast ASH1-E1 element was discussed to articulate that these functional aspects can also suggest testable hypotheses. Then the proposed method was verified to be well applicable to other species through a human test set. Finally, the proposed method was demonstrated to show resistance to sequence length-dependent performance deterioration.
Collapse
|
6
|
Lu W, Cao Y, Wu H, Ding Y, Song Z, Zhang Y, Fu Q, Li H. Research on RNA secondary structure predicting via bidirectional recurrent neural network. BMC Bioinformatics 2021; 22:431. [PMID: 34496763 PMCID: PMC8427827 DOI: 10.1186/s12859-021-04332-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RESULTS The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. CONCLUSIONS The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Yan Cao
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Zhengwei Song
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Yu Zhang
- Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, 215123, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
7
|
Yang TH, Wang CY, Tsai HC, Liu CT. Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6263636. [PMID: 33942874 PMCID: PMC8094437 DOI: 10.1093/database/baab025] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022]
Abstract
It is now known that cap-independent translation initiation facilitated by internal ribosome entry sites (IRESs) is vital in selective cellular protein synthesis under stress and different physiological conditions. However, three problems make it hard to understand transcriptome-wide cellular IRES-mediated translation initiation mechanisms: (i) complex interplay between IRESs and other translation initiation–related information, (ii) reliability issue of in silico cellular IRES investigation and (iii) labor-intensive in vivo IRES identification. In this research, we constructed the Human IRES Atlas database for a comprehensive understanding of cellular IRESs in humans. First, currently available and suitable IRES prediction tools (IRESfinder, PatSearch and IRESpy) were used to obtain transcriptome-wide human IRESs. Then, we collected eight genres of translation initiation–related features to help study the potential molecular mechanisms of each of the putative IRESs. Three functional tests (conservation, structural RNA–protein scores and conditional translation efficiency) were devised to evaluate the functionality of the identified putative IRESs. Moreover, an easy-to-use interface and an IRES–translation initiation interaction map for each gene transcript were implemented to help understand the interactions between IRESs and translation initiation–related features. Researchers can easily search/browse an IRES of interest using the web interface and deduce testable mechanism hypotheses of human IRES-driven translation initiation based on the integrated results. In summary, Human IRES Atlas integrates putative IRES elements and translation initiation–related experiments for better usage of these data and deduction of mechanism hypotheses. Database URL: http://cobishss0.im.nuk.edu.tw/Human_IRES_Atlas/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| |
Collapse
|
8
|
Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA Secondary Structure and Tertiary Base-pairing Prediction Using Evolutionary Profile, Mutational Coupling and Two-dimensional Transfer Learning. Bioinformatics 2021; 37:2589-2600. [PMID: 33704363 DOI: 10.1093/bioinformatics/btab165] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/05/2021] [Accepted: 03/08/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
9
|
Bossanyi MA, Carpentier V, Glouzon JPS, Ouangraoua A, Anselmetti Y. aliFreeFoldMulti: alignment-free method to predict secondary structures of multiple RNA homologs. NAR Genom Bioinform 2020; 2:lqaa086. [PMID: 33575631 PMCID: PMC7671329 DOI: 10.1093/nargab/lqaa086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 10/19/2020] [Indexed: 11/18/2022] Open
Abstract
Predicting RNA structure is crucial for understanding RNA’s mechanism of action. Comparative approaches for the prediction of RNA structures can be classified into four main strategies. The three first—align-and-fold, align-then-fold and fold-then-align—exploit multiple sequence alignments to improve the accuracy of conserved RNA-structure prediction. Align-and-fold methods perform generally better, but are also typically slower than the other alignment-based methods. The fourth strategy—alignment-free—consists in predicting the conserved RNA structure without relying on sequence alignment. This strategy has the advantage of being the faster, while predicting accurate structures through the use of latent representations of the candidate structures for each sequence. This paper presents aliFreeFoldMulti, an extension of the aliFreeFold algorithm. This algorithm predicts a representative secondary structure of multiple RNA homologs by using a vector representation of their suboptimal structures. aliFreeFoldMulti improves on aliFreeFold by additionally computing the conserved structure for each sequence. aliFreeFoldMulti is assessed by comparing its prediction performance and time efficiency with a set of leading RNA-structure prediction methods. aliFreeFoldMulti has the lowest computing times and the highest maximum accuracy scores. It achieves comparable average structure prediction accuracy as other methods, except TurboFoldII which is the best in terms of average accuracy but with the highest computing times. We present aliFreeFoldMulti as an illustration of the potential of alignment-free approaches to provide fast and accurate RNA-structure prediction methods.
Collapse
Affiliation(s)
- Marc-André Bossanyi
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Valentin Carpentier
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Jean-Pierre S Glouzon
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Aïda Ouangraoua
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Yoann Anselmetti
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| |
Collapse
|
10
|
Zhang W, Tian W, Gao Z, Wang G, Zhao H. Phylogenetic Utility of rRNA ITS2 Sequence-Structure under Functional Constraint. Int J Mol Sci 2020; 21:ijms21176395. [PMID: 32899108 PMCID: PMC7504139 DOI: 10.3390/ijms21176395] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 08/27/2020] [Accepted: 08/28/2020] [Indexed: 02/07/2023] Open
Abstract
The crucial function of the internal transcribed spacer 2 (ITS2) region in ribosome biogenesis depends on its secondary and tertiary structures. Despite rapidly evolving, ITS2 is under evolutionary constraints to maintain the specific secondary structures that provide functionality. A link between function, structure and evolution could contribute an understanding to each other and recently has created a growing point of sequence-structure phylogeny of ITS2. Here we briefly review the current knowledge of ITS2 processing in ribosome biogenesis, focusing on the conservative characteristics of ITS2 secondary structure, including structure form, structural motifs, cleavage sites, and base-pair interactions. We then review the phylogenetic implications and applications of this structure information, including structure-guiding sequence alignment, base-pair mutation model, and species distinguishing. We give the rationale for why incorporating structure information into tree construction could improve reliability and accuracy, and some perspectives of bioinformatics coding that allow for a meaningful evolutionary character to be extracted. In sum, this review of the integration of function, structure and evolution of ITS2 will expand the traditional sequence-based ITS2 phylogeny and thus contributes to the tree of life. The generality of ITS2 characteristics may also inspire phylogenetic use of other similar structural regions.
Collapse
Affiliation(s)
- Wei Zhang
- Marine College, Shandong University, Weihai 264209, China; (Z.G.); (G.W.); (H.Z.)
- Correspondence: ; Tel.: +86-631-5688-303
| | - Wen Tian
- State Key Laboratory of Ballast Water Research, Comprehensive Technical Service Center of Jiangyin Customs, Jiangyin 214440, China;
| | - Zhipeng Gao
- Marine College, Shandong University, Weihai 264209, China; (Z.G.); (G.W.); (H.Z.)
| | - Guoli Wang
- Marine College, Shandong University, Weihai 264209, China; (Z.G.); (G.W.); (H.Z.)
| | - Hong Zhao
- Marine College, Shandong University, Weihai 264209, China; (Z.G.); (G.W.); (H.Z.)
| |
Collapse
|