1
|
Qiu X. Robust RNA secondary structure prediction with a mixture of deep learning and physics-based experts. Biol Methods Protoc 2025; 10:bpae097. [PMID: 39811444 PMCID: PMC11729747 DOI: 10.1093/biomethods/bpae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 12/01/2024] [Accepted: 12/25/2024] [Indexed: 01/16/2025] Open
Abstract
A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference. Specifically, motivated by the clustered distribution of known RNA structures, a collection of distinct DL models is trained by iteratively leaving one cluster out. Each DL model hence serves as an expert on all but one cluster in the training data. Consequently, for an ID sequence, all but one DL model makes accurate predictions consistent with one another, while an OOD sequence yields highly inconsistent predictions among all DL models. Through consensus analysis of DL predictions, test sequences are categorized as ID or OOD. ID sequences are subsequently predicted by averaging the DL models in consensus, and OOD sequences are predicted using physics-based models. Instead of remediating generalization gaps with alternative approaches such as transfer learning and sequence alignment, MoEFold2D circumvents unpredictable ID-OOD gaps and combines the strengths of DL and physics-based models to achieve accurate ID and robust OOD predictions.
Collapse
Affiliation(s)
- Xiangyun Qiu
- Department of Physics, George Washington University, Washington, DC 20052, United States
| |
Collapse
|
2
|
Chen CC, Chan YM, Jeong H. REDalign: accurate RNA structural alignment using residual encoder-decoder network. BMC Bioinformatics 2024; 25:346. [PMID: 39501155 PMCID: PMC11539752 DOI: 10.1186/s12859-024-05956-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 10/11/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of O ( L 6 ) for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities. RESULTS In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency. CONCLUSION REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical Engineering, National Chiayi University, No.300 Xuefu Rd, Chiayi City, 600355, Taiwan.
| | - Yi-Ming Chan
- MindtronicAI Co., 7 F., No. 218, Sec. 6, Roosevelt Road, Taipei, 11674, Taiwan
| | - Hyundoo Jeong
- Biomedical and Robotics Engineering, Incheon National University, 119 Academy-ro, Incheon, 22012, Yeonsu-gu, South Korea.
| |
Collapse
|
3
|
Gadekar V, Munk AW, Miladi M, Junge A, Backofen R, Seemann S, Gorodkin J. Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites. NAR Genom Bioinform 2024; 6:lqae089. [PMID: 39131818 PMCID: PMC11310781 DOI: 10.1093/nargab/lqae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/26/2024] [Accepted: 07/16/2024] [Indexed: 08/13/2024] Open
Abstract
RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.
Collapse
Affiliation(s)
- Veerendra P Gadekar
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Alexander Welford Munk
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| |
Collapse
|
4
|
Du Z, Peng Z, Yang J. RNA threading with secondary structure and sequence profile. Bioinformatics 2024; 40:btae080. [PMID: 38341662 PMCID: PMC10893584 DOI: 10.1093/bioinformatics/btae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 01/05/2024] [Accepted: 02/09/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION RNA threading aims to identify remote homologies for template-based modeling of RNA 3D structure. Existing RNA alignment methods primarily rely on secondary structure alignment. They are often time- and memory-consuming, limiting large-scale applications. In addition, the accuracy is far from satisfactory. RESULTS Using RNA secondary structure and sequence profile, we developed a novel RNA threading algorithm, named RNAthreader. To enhance the alignment process and minimize memory usage, a novel approach has been introduced to simplify RNA secondary structures into compact diagrams. RNAthreader employs a two-step methodology. Initially, integer programming and dynamic programming are combined to create an initial alignment for the simplified diagram. Subsequently, the final alignment is obtained using dynamic programming, taking into account the initial alignment derived from the previous step. The benchmark test on 80 RNAs illustrates that RNAthreader generates more accurate alignments than other methods, especially for RNAs with pseudoknots. Another benchmark, involving 30 RNAs from the RNA-Puzzles experiments, exhibits that the models constructed using RNAthreader templates have a lower average RMSD than those created by alternative methods. Remarkably, RNAthreader takes less than two hours to complete alignments with ∼5000 RNAs, which is 3-40 times faster than other methods. These compelling results suggest that RNAthreader is a promising algorithm for RNA template detection. AVAILABILITY AND IMPLEMENTATION https://yanglab.qd.sdu.edu.cn/RNAthreader.
Collapse
Affiliation(s)
- Zongyang Du
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Zhenling Peng
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
5
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
6
|
Eggenhofer F, Höner Zu Siederdissen C. Evolutionary Structure Conservation and Covariance Scores. Methods Mol Biol 2024; 2726:255-284. [PMID: 38780735 DOI: 10.1007/978-1-0716-3519-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Effective homology search for non-coding RNAs is frequently not possible via sequence similarity alone. Current methods leverage evolutionary information like structure conservation or covariance scores to identify homologs in organisms that are phylogenetically more distant. In this chapter, we introduce the theoretical background of evolutionary structure conservation and covariance score, and we show hands-on how current methods in the field are applied on example datasets.
Collapse
Affiliation(s)
- Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science University of Freiburg, Freiburg, Germany
| | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Bioinformatics/High-Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.
| |
Collapse
|
7
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
8
|
Qiu X. Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction. PLoS Comput Biol 2023; 19:e1011047. [PMID: 37068100 PMCID: PMC10138783 DOI: 10.1371/journal.pcbi.1011047] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/27/2023] [Accepted: 03/25/2023] [Indexed: 04/18/2023] Open
Abstract
Making no use of physical laws or co-evolutionary information, de novo deep learning (DL) models for RNA secondary structure prediction have achieved far superior performances than traditional algorithms. However, their statistical underpinning raises the crucial question of generalizability. We present a quantitative study of the performance and generalizability of a series of de novo DL models, with a minimal two-module architecture and no post-processing, under varied similarities between seen and unseen sequences. Our models demonstrate excellent expressive capacities and outperform existing methods on common benchmark datasets. However, model generalizability, i.e., the performance gap between the seen and unseen sets, degrades rapidly as the sequence similarity decreases. The same trends are observed from several recent DL and machine learning models. And an inverse correlation between performance and generalizability is revealed collectively across all learning-based models with wide-ranging architectures and sizes. We further quantitate how generalizability depends on sequence and structure identity scores via pairwise alignment, providing unique quantitative insights into the limitations of statistical learning. Generalizability thus poses a major hurdle for deploying de novo DL models in practice and various pathways for future advances are discussed.
Collapse
Affiliation(s)
- Xiangyun Qiu
- Department of Physics, George Washington University, Washington DC, United States of America
| |
Collapse
|
9
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
10
|
Network-Based Structural Alignment of RNA Sequences Using TOPAS. Methods Mol Biol 2023; 2586:147-162. [PMID: 36705903 DOI: 10.1007/978-1-0716-2768-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
TOPAS (TOPological network-based Alignment of Structural RNAs) is a network-based alignment algorithm that predicts structurally sound pairwise alignment of RNAs. In order to take advantage of recent advances in comparative network analysis for efficient structurally sound RNA alignment, TOPAS constructs topological network representations for RNAs, which consist of sequential edges connecting nucleotide bases as well as structural edges reflecting the underlying folding structure. Structural edges are weighted by the estimated base-pairing probabilities. Next, the constructed networks are aligned using probabilistic network alignment techniques, which yield a structurally sound RNA alignment that considers both the sequence similarity and the structural similarity between the given RNAs. Compared to traditional Sankoff-style algorithms, this network-based alignment scheme leads to a significant reduction in the overall computational cost while yielding favorable alignment results. Another important benefit is its capability to handle arbitrary folding structures, which can potentially lead to more accurate alignment for RNAs with pseudoknots.
Collapse
|
11
|
González-Tortuero E, Anthon C, Havgaard JH, Geissler AS, Breüner A, Hjort C, Gorodkin J, Seemann SE. The Bacillaceae-1 RNA motif comprises two distinct classes. Gene 2022; 841:146756. [PMID: 35905857 DOI: 10.1016/j.gene.2022.146756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/10/2022] [Accepted: 07/24/2022] [Indexed: 11/04/2022]
Abstract
Non-coding RNAs are key regulatory players in bacteria. Many computationally predicted non-coding RNAs, however, lack functional associations. An example is the Bacillaceae-1 RNA motif, whose Rfam model consists of two hairpin loops. We find the motif conserved in nine of 13 non-pathogenic strains of the genus Bacillus but only in one pathogenic strain. To elucidate functional characteristics, we studied 118 hits of the Rfam model in 11 Bacillus spp. and found two distinct classes based on the ensemble diversity of their RNA secondary structure and the genomic context concerning the ribosomal RNA (rRNA) cluster. Forty hits are associated with the rRNA cluster, of which all 19 hits upstream flanking of 16S rRNA have a reverse complementary structure of low structural diversity. Fifty-two hits have large ensemble diversity, of which 38 are located between two coding genes. For eight hits in Bacillus subtilis, we investigated public expression data under various conditions and observed either the forward or the reverse complementary motif expressed. Five hits are associated with the rRNA cluster. Four of them are located upstream of the 16S rRNA and are not transcriptionally active, but instead, their reverse complements with low structural diversity are expressed together with the rRNA cluster. The three other hits are located between two coding genes in non-conserved genomic loci. Two of them are independently expressed from their surrounding genes and are structurally diverse. In summary, we found that Bacillaceae-1 RNA motifs upstream flanking of ribosomal RNA clusters tend to have one stable structure with the reverse complementary motif expressed in B. subtilis. In contrast, a subgroup of intergenic motifs has the thermodynamic potential for structural switches.
Collapse
Affiliation(s)
- Enrique González-Tortuero
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Christian Anthon
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jakob H Havgaard
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Adrian S Geissler
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | | | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| |
Collapse
|
12
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
13
|
Akiyama M, Sakakibara Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform 2022; 4:lqac012. [PMID: 35211670 PMCID: PMC8862729 DOI: 10.1093/nargab/lqac012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/08/2022] [Accepted: 02/05/2022] [Indexed: 01/17/2023] Open
Abstract
Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this ‘informative base embedding’ and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman–Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n2) instead of the O(n6) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.
Collapse
Affiliation(s)
- Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 223-8522, Japan
| | | |
Collapse
|
14
|
Woźniak T, Sajek M, Jaruzelska J, Sajek MP. RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix. BMC Bioinformatics 2021; 22:504. [PMID: 34656080 PMCID: PMC8520625 DOI: 10.1186/s12859-021-04426-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 10/05/2021] [Indexed: 11/15/2022] Open
Abstract
Background The functions of RNA molecules are mainly determined by their secondary structures. These functions can also be predicted using bioinformatic tools that enable the alignment of multiple RNAs to determine functional domains and/or classify RNA molecules into RNA families. However, the existing multiple RNA alignment tools, which use structural information, are slow in aligning long molecules and/or a large number of molecules. Therefore, a more rapid tool for multiple RNA alignment may improve the classification of known RNAs and help to reveal the functions of newly discovered RNAs. Results Here, we introduce an extremely fast Python-based tool called RNAlign2D. It converts RNA sequences to pseudo-amino acid sequences, which incorporate structural information, and uses a customizable scoring matrix to align these RNA molecules via the multiple protein sequence alignment tool MUSCLE. Conclusions RNAlign2D produces accurate RNA alignments in a very short time. The pseudo-amino acid substitution matrix approach utilized in RNAlign2D is applicable for virtually all protein aligners. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04426-8.
Collapse
Affiliation(s)
- Tomasz Woźniak
- Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479, Poznań, Poland
| | - Małgorzata Sajek
- Department of Human Molecular Genetics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznańskiego 6, 61-614, Poznań, Poland
| | - Jadwiga Jaruzelska
- Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479, Poznań, Poland
| | - Marcin Piotr Sajek
- Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479, Poznań, Poland. .,RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
15
|
Bossanyi MA, Carpentier V, Glouzon JPS, Ouangraoua A, Anselmetti Y. aliFreeFoldMulti: alignment-free method to predict secondary structures of multiple RNA homologs. NAR Genom Bioinform 2020; 2:lqaa086. [PMID: 33575631 PMCID: PMC7671329 DOI: 10.1093/nargab/lqaa086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 10/19/2020] [Indexed: 11/18/2022] Open
Abstract
Predicting RNA structure is crucial for understanding RNA’s mechanism of action. Comparative approaches for the prediction of RNA structures can be classified into four main strategies. The three first—align-and-fold, align-then-fold and fold-then-align—exploit multiple sequence alignments to improve the accuracy of conserved RNA-structure prediction. Align-and-fold methods perform generally better, but are also typically slower than the other alignment-based methods. The fourth strategy—alignment-free—consists in predicting the conserved RNA structure without relying on sequence alignment. This strategy has the advantage of being the faster, while predicting accurate structures through the use of latent representations of the candidate structures for each sequence. This paper presents aliFreeFoldMulti, an extension of the aliFreeFold algorithm. This algorithm predicts a representative secondary structure of multiple RNA homologs by using a vector representation of their suboptimal structures. aliFreeFoldMulti improves on aliFreeFold by additionally computing the conserved structure for each sequence. aliFreeFoldMulti is assessed by comparing its prediction performance and time efficiency with a set of leading RNA-structure prediction methods. aliFreeFoldMulti has the lowest computing times and the highest maximum accuracy scores. It achieves comparable average structure prediction accuracy as other methods, except TurboFoldII which is the best in terms of average accuracy but with the highest computing times. We present aliFreeFoldMulti as an illustration of the potential of alignment-free approaches to provide fast and accurate RNA-structure prediction methods.
Collapse
Affiliation(s)
- Marc-André Bossanyi
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Valentin Carpentier
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Jean-Pierre S Glouzon
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Aïda Ouangraoua
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| | - Yoann Anselmetti
- CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l’Université, Sherbrooke, QC J1K 2R1, Canada
| |
Collapse
|
16
|
Chen CC, Jeong H, Qian X, Yoon BJ. TOPAS: network-based structural alignment of RNA sequences. Bioinformatics 2020; 35:2941-2948. [PMID: 30629122 DOI: 10.1093/bioinformatics/btz001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/07/2018] [Accepted: 01/04/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. RESULTS In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. AVAILABILITY AND IMPLEMENTATION Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Hyundoo Jeong
- Department of Electronic Engineering, Chosun University, Gwangju, Republic of Korea
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
17
|
Tourasse NJ, Darfeuille F. Structural Alignment and Covariation Analysis of RNA Sequences. Bio Protoc 2020; 10:e3511. [PMID: 33654736 PMCID: PMC7842705 DOI: 10.21769/bioprotoc.3511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 12/29/2019] [Accepted: 12/29/2019] [Indexed: 11/02/2022] Open
Abstract
RNA molecules adopt defined structural conformations that are essential to exert their function. During the course of evolution, the structure of a given RNA can be maintained via compensatory base-pair changes that occur among covarying nucleotides in paired regions. Therefore, for comparative, structural, and evolutionary studies of RNA molecules, numerous computational tools have been developed to incorporate structural information into sequence alignments and a number of tools have been developed to study covariation. The bioinformatic protocol presented here explains how to use some of these tools to generate a secondary-structure-aware multiple alignment of RNA sequences and to annotate the alignment to examine the conservation and covariation of structural elements among the sequences.
Collapse
Affiliation(s)
- Nicolas J. Tourasse
- ARNA Laboratory, INSERM U1212, CNRS UMR5320, University of Bordeaux, Bordeaux, France
| | - Fabien Darfeuille
- ARNA Laboratory, INSERM U1212, CNRS UMR5320, University of Bordeaux, Bordeaux, France
| |
Collapse
|
18
|
Bayegan AH, Clote P. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment. PLoS One 2020; 15:e0227177. [PMID: 31978147 PMCID: PMC6980424 DOI: 10.1371/journal.pone.0227177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 12/13/2019] [Indexed: 11/19/2022] Open
Abstract
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
Collapse
Affiliation(s)
- Amir H. Bayegan
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
- * E-mail:
| |
Collapse
|
19
|
Pisapia L, Hamilton RS, Farina F, D’Agostino V, Barba P, Strazzullo M, Provenzani A, Gianfrani C, Del Pozzo G. Tristetraprolin/ZFP36 Regulates the Turnover of Autoimmune-Associated HLA-DQ mRNAs. Cells 2019; 8:cells8121570. [PMID: 31817224 PMCID: PMC6953012 DOI: 10.3390/cells8121570] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 12/20/2022] Open
Abstract
HLA class II genes encode highly polymorphic heterodimeric proteins functioning to present antigens to T cells and stimulate a specific immune response. Many HLA genes are strongly associated with autoimmune diseases as they stimulate self-antigen specific CD4+ T cells driving pathogenic responses against host tissues or organs. High expression of HLA class II risk genes is associated with autoimmune diseases, influencing the strength of the CD4+ T-mediated autoimmune response. The expression of HLA class II genes is regulated at both transcriptional and post-transcriptional levels. Protein components of the RNP complex binding the 3'UTR and affecting mRNA processing have previously been identified. Following on from this, the regulation of HLA-DQ2.5 risk genes, the main susceptibility genetic factor for celiac disease (CD), was investigated. The DQ2.5 molecule, encoded by HLA-DQA1*05 and HLA-DQB1*02 alleles, presents the antigenic gluten peptides to CD4+ T lymphocytes, activating the autoimmune response. The zinc-finger protein Tristetraprolin (TTP) or ZFP36 was identified to be a component of the RNP complex and has been described as a factor modulating mRNA stability. The 3'UTR of CD-associated HLA-DQA1*05 and HLA-DQB1*02 mRNAs do not contain canonical TTP binding consensus sequences, therefore an in silico approach focusing on mRNA secondary structure accessibility and stability was undertaken. Key structural differences specific to the CD-associated mRNAs were uncovered, allowing them to strongly interact with TTP through their 3'UTR, conferring a rapid turnover, in contrast to lower affinity binding to HLA non-CD associated mRNA.
Collapse
Affiliation(s)
- Laura Pisapia
- Institute of Genetics and Biophysics “Adriano Buzzati Traverso” CNR, Via Pietro Castellino, 111, 80131 Naples, Italy; (L.P.); (F.F.); (P.B.); (M.S.)
| | - Russell S. Hamilton
- Centre for Trophoblast Research, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge CB2 3DY, UK;
| | - Federica Farina
- Institute of Genetics and Biophysics “Adriano Buzzati Traverso” CNR, Via Pietro Castellino, 111, 80131 Naples, Italy; (L.P.); (F.F.); (P.B.); (M.S.)
| | - Vito D’Agostino
- Centre for Cellular, Computational and Integrative Biology-CIBIO, University of Trento, via Sommarive 9, 38123 Trento, Italy; (V.D.); (A.P.)
| | - Pasquale Barba
- Institute of Genetics and Biophysics “Adriano Buzzati Traverso” CNR, Via Pietro Castellino, 111, 80131 Naples, Italy; (L.P.); (F.F.); (P.B.); (M.S.)
| | - Maria Strazzullo
- Institute of Genetics and Biophysics “Adriano Buzzati Traverso” CNR, Via Pietro Castellino, 111, 80131 Naples, Italy; (L.P.); (F.F.); (P.B.); (M.S.)
| | - Alessandro Provenzani
- Centre for Cellular, Computational and Integrative Biology-CIBIO, University of Trento, via Sommarive 9, 38123 Trento, Italy; (V.D.); (A.P.)
| | - Carmen Gianfrani
- Institute of Biochemistry and Cell Biology-CNR, Via Pietro Castellino, 111, 80131 Naples, Italy;
| | - Giovanna Del Pozzo
- Institute of Genetics and Biophysics “Adriano Buzzati Traverso” CNR, Via Pietro Castellino, 111, 80131 Naples, Italy; (L.P.); (F.F.); (P.B.); (M.S.)
- Correspondence:
| |
Collapse
|
20
|
Crum M, Ram-Mohan N, Meyer MM. Regulatory context drives conservation of glycine riboswitch aptamers. PLoS Comput Biol 2019; 15:e1007564. [PMID: 31860665 PMCID: PMC6944388 DOI: 10.1371/journal.pcbi.1007564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 01/06/2020] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
In comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of non-coding (ncRNA) genes is not well understood. Many ncRNA genes are narrowly distributed to only a few organisms, and appear to be rapidly evolving. Compared to protein coding sequences, there are many challenges associated with assessment of ncRNAs that are not well addressed by conventional phylogenetic approaches, including: short sequence length, lack of primary sequence conservation, and the importance of secondary structure for biological function. Riboswitches are structured ncRNAs that directly interact with small molecules to regulate gene expression in bacteria. They typically consist of a ligand-binding domain (aptamer) whose folding changes drive changes in gene expression. The glycine riboswitch is among the most well-studied due to the widespread occurrence of a tandem aptamer arrangement (tandem), wherein two homologous aptamers interact with glycine and each other to regulate gene expression. However, a significant proportion of glycine riboswitches are comprised of single aptamers (singleton). Here we use graph clustering to circumvent the limitations of traditional phylogenetic analysis when studying the relationship between the tandem and singleton glycine aptamers. Graph clustering enables a broader range of pairwise comparison measures to be used to assess aptamer similarity. Using this approach, we show that one aptamer of the tandem glycine riboswitch pair is typically much more highly conserved, and that which aptamer is conserved depends on the regulated gene. Furthermore, our analysis also reveals that singleton aptamers are more similar to either the first or second tandem aptamer, again based on the regulated gene. Taken together, our findings suggest that tandem glycine riboswitches degrade into functional singletons, with the regulated gene(s) dictating which glycine-binding aptamer is conserved.
Collapse
Affiliation(s)
- Matt Crum
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Nikhil Ram-Mohan
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| | - Michelle M. Meyer
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America
| |
Collapse
|
21
|
Francisco-Velilla R, Fernandez-Chamorro J, Dotu I, Martinez-Salas E. The landscape of the non-canonical RNA-binding site of Gemin5 unveils a feedback loop counteracting the negative effect on translation. Nucleic Acids Res 2019; 46:7339-7353. [PMID: 29771365 PMCID: PMC6101553 DOI: 10.1093/nar/gky361] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 05/08/2018] [Indexed: 01/01/2023] Open
Abstract
Gemin5 is a predominantly cytoplasmic protein that downregulates translation, beyond controlling snRNPs assembly. The C-terminal region harbors a non-canonical RNA-binding site consisting of two domains, RBS1 and RBS2, which differ in RNA-binding capacity and the ability to modulate translation. Here, we show that these domains recognize distinct RNA targets in living cells. Interestingly, the most abundant and exclusive RNA target of the RBS1 domain was Gemin5 mRNA. Biochemical and functional characterization of this target demonstrated that RBS1 polypeptide physically interacts with a predicted thermodynamically stable stem–loop upregulating mRNA translation, thereby counteracting the negative effect of Gemin5 protein on global protein synthesis. In support of this result, destabilization of the stem–loop impairs the stimulatory effect on translation. Moreover, RBS1 stimulates translation of the endogenous Gemin5 mRNA. Hence, although the RBS1 domain downregulates global translation, it positively enhances translation of RNA targets carrying thermodynamically stable secondary structure motifs. This mechanism allows fine-tuning the availability of Gemin5 to play its multiple roles in gene expression control.
Collapse
Affiliation(s)
| | | | - Ivan Dotu
- Pompeu Fabra University (UPF), 08003 Barcelona, Spain.,IMIM - Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | | |
Collapse
|
22
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
23
|
Smith MA, Seemann SE, Quek XC, Mattick JS. DotAligner: identification and clustering of RNA structure motifs. Genome Biol 2017; 18:244. [PMID: 29284541 PMCID: PMC5747123 DOI: 10.1186/s13059-017-1371-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/05/2017] [Indexed: 01/01/2023] Open
Abstract
The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised discovery and classification of homologous RNA structure motifs from a set of sequences of interest. Our approach outperforms comparable algorithms at clustering known RNA structure families, both in speed and accuracy. It identifies clusters of known and novel structure motifs from ENCODE immunoprecipitation data for 44 RNA-binding proteins.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia. .,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870, Frederiksberg, Denmark
| | - Xiu Cheng Quek
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| | - John S Mattick
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| |
Collapse
|
24
|
Kato Y, Gorodkin J, Havgaard JH. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots. BMC Genomics 2017; 18:935. [PMID: 29197323 PMCID: PMC5712110 DOI: 10.1186/s12864-017-4309-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 11/15/2017] [Indexed: 01/01/2023] Open
Abstract
Background Structured non-coding RNAs play many different roles in the cells, but the annotation of these RNAs is lacking even within the human genome. The currently available computational tools are either too computationally heavy for use in full genomic screens or rely on pre-aligned sequences. Methods Here we present a fast and efficient method, DotcodeR, for detecting structurally similar RNAs in genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes without alignment. Results Our computational experiments with simulated data and real chromosomes demonstrate that the presented method has good sensitivity. Conclusions DotcodeR can be useful as a pre-filter in a genomic comparative scan for structured RNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4309-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuki Kato
- Department of RNA Biology and Neuroscience, Graduate School of Medicine, Osaka University, 2-2 Yamadaoka, Suita, 565-0871, Japan. .,Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark
| | - Jakob Hull Havgaard
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.
| |
Collapse
|
25
|
Elucidating the Role of Host Long Non-Coding RNA during Viral Infection: Challenges and Paths Forward. Vaccines (Basel) 2017; 5:vaccines5040037. [PMID: 29053596 PMCID: PMC5748604 DOI: 10.3390/vaccines5040037] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 10/12/2017] [Accepted: 10/17/2017] [Indexed: 12/31/2022] Open
Abstract
Research over the past decade has clearly shown that long non-coding RNAs (lncRNAs) are functional. Many lncRNAs can be related to immunity and the host response to viral infection, but their specific functions remain largely elusive. The vast majority of lncRNAs are annotated with extremely limited knowledge and tend to be expressed at low levels, making ad hoc experimentation difficult. Changes to lncRNA expression during infection can be systematically profiled using deep sequencing; however, this often produces an intractable number of candidate lncRNAs, leaving no clear path forward. For these reasons, it is especially important to prioritize lncRNAs into high-confidence “hits” by utilizing multiple methodologies. Large scale perturbation studies may be used to screen lncRNAs involved in phenotypes of interest, such as resistance to viral infection. Single cell transcriptome sequencing quantifies cell-type specific lncRNAs that are less abundant in a mixture. When coupled with iterative experimental validations, new computational strategies for efficiently integrating orthogonal high-throughput data will likely be the driver for elucidating the functional role of lncRNAs during viral infection. This review highlights new high-throughput technologies and discusses the potential for integrative computational analysis to streamline the identification of infection-related lncRNAs and unveil novel targets for antiviral therapeutics.
Collapse
|
26
|
Seemann SE, Mirza AH, Hansen C, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Torarinsson E, Yao Z, Workman CT, Pociot F, Nielsen H, Tommerup N, Ruzzo WL, Gorodkin J. The identification and functional annotation of RNA structures conserved in vertebrates. Genome Res 2017; 27:1371-1383. [PMID: 28487280 PMCID: PMC5538553 DOI: 10.1101/gr.208652.116] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 05/04/2017] [Indexed: 01/15/2023]
Abstract
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality.
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870 Frederiksberg, Denmark
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Copenhagen Diabetes Research Center (CPH-DIRECT), Herlev University Hospital, DK-2730 Herlev, Denmark
| | - Claus Hansen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Obesity Biology and Department of Molecular Genetics, Novo Nordisk A/S, DK-2880 Bagsværd, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Mikkel Christensen-Dalsgaard
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Elfar Torarinsson
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark
| | - Zizhen Yao
- Allen Institute for Brain Science, Seattle, Washington 98109, USA
| | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Copenhagen Diabetes Research Center (CPH-DIRECT), Herlev University Hospital, DK-2730 Herlev, Denmark
| | - Henrik Nielsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,School of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870 Frederiksberg, Denmark
| |
Collapse
|
27
|
Kumagai Y, Vandenbon A, Teraguchi S, Akira S, Suzuki Y. Genome-wide map of RNA degradation kinetics patterns in dendritic cells after LPS stimulation facilitates identification of primary sequence and secondary structure motifs in mRNAs. BMC Genomics 2016; 17:1032. [PMID: 28155712 PMCID: PMC5259865 DOI: 10.1186/s12864-016-3325-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Immune cells have to change their gene expression patterns dynamically in response to external stimuli such as lipopolysaccharide (LPS). The gene expression is regulated at multiple steps in eukaryotic cells, in which control of RNA levels at both the transcriptional level and the post-transcriptional level plays important role. Impairment of the control leads to aberrant immune responses such as excessive or impaired production of cytokines. However, genome-wide studies focusing on the post-transcriptional control were relatively rare until recently. Moreover, several RNA cis elements and RNA-binding proteins have been found to be involved in the process, but our general understanding remains poor, partly because identification of regulatory RNA motifs is very challenging in spite of its importance. We took advantage of genome-wide measurement of RNA degradation in combination with estimation of degradation kinetics by qualitative approach, and performed de novo prediction of RNA sequence and structure motifs. METHODS To classify genes by their RNA degradation kinetics, we first measured RNA degradation time course in mouse dendritic cells after LPS stimulation and the time courses were clustered to estimate degradation kinetics and to find patterns in the kinetics. Then genes were clustered by their similarity in degradation kinetics patterns. The 3' UTR sequences of a cluster was subjected to de novo sequence or structure motif prediction. RESULTS The quick degradation kinetics was found to be strongly associated with lower gene expression level, immediate regulation (both induction and repression) of gene expression level, and longer 3' UTR length. De novo sequence motif prediction found AU-rich element-like and TTP-binding sequence-like motifs which are enriched in quickly degrading genes. De novo structure motif prediction found a known functional motif, namely stem-loop structure containing sequence bound by RNA-binding protein Roquin and Regnase-1, as well as unknown motifs. CONCLUSIONS The current study indicated that degradation kinetics patterns lead to classification different from that by gene expression and the differential classification facilitates identification of functional motifs. Identification of novel motif candidates implied post-transcriptional controls different from that by known pairs of RNA-binding protein and RNA motif.
Collapse
Affiliation(s)
- Yutaro Kumagai
- Quantitative Immunology Research Unit, WPI Immunology Frontier Research Center, Osaka University, 3-1 Yamada-oka, Suita, Osaka, 565-0871, Japan.
| | - Alexis Vandenbon
- Immuno-Genomics Research Unit, WPI Immunology Frontier Research Center, Osaka University, 3-1 Yamada-oka, Suita, Osaka, 565-0871, Japan
| | - Shunsuke Teraguchi
- Quantitative Immunology Research Unit, WPI Immunology Frontier Research Center, Osaka University, 3-1 Yamada-oka, Suita, Osaka, 565-0871, Japan.
| | - Shizuo Akira
- Laboratory of Host Defense, WPI Immunology Frontier Research Center, Osaka University, 3-1 Yamada-oka, Suita, Osaka, 565-0871, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8561, Japan
| |
Collapse
|