1
|
Penchovsky R, Georgieva AV, Dyakova V, Traykovska M, Pavlova N. Antisense and Functional Nucleic Acids in Rational Drug Development. Antibiotics (Basel) 2024; 13:221. [PMID: 38534656 DOI: 10.3390/antibiotics13030221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 03/28/2024] Open
Abstract
This review is focused on antisense and functional nucleic acid used for completely rational drug design and drug target assessment, aiming to reduce the time and money spent and increase the successful rate of drug development. Nucleic acids have unique properties that play two essential roles in drug development as drug targets and as drugs. Drug targets can be messenger, ribosomal, non-coding RNAs, ribozymes, riboswitches, and other RNAs. Furthermore, various antisense and functional nucleic acids can be valuable tools in drug discovery. Many mechanisms for RNA-based control of gene expression in both pro-and-eukaryotes and engineering approaches open new avenues for drug discovery with a critical role. This review discusses the design principles, applications, and prospects of antisense and functional nucleic acids in drug delivery and design. Such nucleic acids include antisense oligonucleotides, synthetic ribozymes, and siRNAs, which can be employed for rational antibacterial drug development that can be very efficient. An important feature of antisense and functional nucleic acids is the possibility of using rational design methods for drug development. This review aims to popularize these novel approaches to benefit the drug industry and patients.
Collapse
Affiliation(s)
- Robert Penchovsky
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| | - Antoniya V Georgieva
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| | - Vanya Dyakova
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| | - Martina Traykovska
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| | - Nikolet Pavlova
- Laboratory of Synthetic Biology and Bioinformatics, Faculty of Biology, Sofia University, "St. Kliment Ohridski", 8 Dragan Tzankov Blvd., 1164 Sofia, Bulgaria
| |
Collapse
|
2
|
Teragawa S, Wang L. ConF: A Deep Learning Model Based on BiLSTM, CNN, and Cross Multi-Head Attention Mechanism for Noncoding RNA Family Prediction. Biomolecules 2023; 13:1643. [PMID: 38002325 PMCID: PMC10669714 DOI: 10.3390/biom13111643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 11/26/2023] Open
Abstract
This paper presents ConF, a novel deep learning model designed for accurate and efficient prediction of noncoding RNA families. NcRNAs are essential functional RNA molecules involved in various cellular processes, including replication, transcription, and gene expression. Identifying ncRNA families is crucial for comprehensive RNA research, as ncRNAs within the same family often exhibit similar functionalities. Traditional experimental methods for identifying ncRNA families are time-consuming and labor-intensive. Computational approaches relying on annotated secondary structure data face limitations in handling complex structures like pseudoknots and have restricted applicability, resulting in suboptimal prediction performance. To overcome these challenges, ConF integrates mainstream techniques such as residual networks with dilated convolutions and cross multi-head attention mechanisms. By employing a combination of dual-layer convolutional networks and BiLSTM, ConF effectively captures intricate features embedded within RNA sequences. This feature extraction process leads to significantly improved prediction accuracy compared to existing methods. Experimental evaluations conducted using a single, publicly available dataset and applying ten-fold cross-validation demonstrate the superiority of ConF in terms of accuracy, sensitivity, and other performance metrics. Overall, ConF represents a promising solution for accurate and efficient ncRNA family prediction, addressing the limitations of traditional experimental and computational methods.
Collapse
Affiliation(s)
- Shoryu Teragawa
- School of Software, Dalian University of Technology, Dalian 116024, China;
| | | |
Collapse
|
3
|
Orro A, Trombetti GA. High-Accuracy ncRNA Function Prediction via Deep Learning Using Global and Local Sequence Information. Biomedicines 2023; 11:1631. [PMID: 37371726 DOI: 10.3390/biomedicines11061631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
The prediction of the biological function of non-coding ribonucleic acid (ncRNA) is an important step towards understanding the regulatory mechanisms underlying many diseases. Since non-coding RNAs are present in great abundance in human cells and are functionally diverse, developing functional prediction tools is necessary. With recent advances in non-coding RNA biology and the availability of complete genome sequences for a large number of species, we now have a window of opportunity for studying non-coding RNA biology. However, the computational methods used to predict the non-coding RNA functions are mostly either scarcely accurate, when based on sequence information alone, or prohibitively expensive in terms of computational burden when a secondary structure prediction is needed. We propose a novel computational method to predict the biological function of non-coding RNA genes that is based on a collection of deep network architectures utilizing solely ncRNA sequence information and which does not rely on or require expensive secondary ncRNA structure information. The approach presented in this work exhibits comparable or superior accuracy to methods that employ both sequence and structural features, at a much lower computational cost.
Collapse
Affiliation(s)
- Alessandro Orro
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), 20054 Segrate, Italy
| | - Gabriele A Trombetti
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), 20054 Segrate, Italy
| |
Collapse
|
4
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
5
|
Chen K, Zhu X, Wang J, Hao L, Liu Z, Liu Y. ncDENSE: a novel computational method based on a deep learning framework for non-coding RNAs family prediction. BMC Bioinformatics 2023; 24:68. [PMID: 36849908 PMCID: PMC9972773 DOI: 10.1186/s12859-023-05191-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 02/16/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Although research on non-coding RNAs (ncRNAs) is a hot topic in life sciences, the functions of numerous ncRNAs remain unclear. In recent years, researchers have found that ncRNAs of the same family have similar functions, therefore, it is important to accurately predict ncRNAs families to identify their functions. There are several methods available to solve the prediction problem of ncRNAs family, whose main ideas can be divided into two categories, including prediction based on the secondary structure features of ncRNAs, and prediction according to sequence features of ncRNAs. The first type of prediction method requires a complicated process and has a low accuracy in obtaining the secondary structure of ncRNAs, while the second type of method has a simple prediction process and a high accuracy, but there is still room for improvement. The existing methods for ncRNAs family prediction are associated with problems such as complicated prediction processes and low accuracy, in this regard, it is necessary to propose a new method to predict the ncRNAs family more perfectly. RESULTS A deep learning model-based method, ncDENSE, was proposed in this study, which predicted ncRNAs families by extracting ncRNAs sequence features. The bases in ncRNAs sequences were encoded by one-hot coding and later fed into an ensemble deep learning model, which contained the dynamic bi-directional gated recurrent unit (Bi-GRU), the dense convolutional network (DenseNet), and the Attention Mechanism (AM). To be specific, dynamic Bi-GRU was used to extract contextual feature information and capture long-term dependencies of ncRNAs sequences. AM was employed to assign different weights to features extracted by Bi-GRU and focused the attention on information with greater weights. Whereas DenseNet was adopted to extract local feature information of ncRNAs sequences and classify them by the full connection layer. According to our results, the ncDENSE method improved the Accuracy, Sensitivity, Precision, F-score, and MCC by 2.08[Formula: see text], 2.33[Formula: see text], 2.14[Formula: see text], 2.16[Formula: see text], and 2.39[Formula: see text], respectively, compared with the suboptimal method. CONCLUSIONS Overall, the ncDENSE method proposed in this paper extracts sequence features of ncRNAs by dynamic Bi-GRU and DenseNet and improves the accuracy in predicting ncRNAs family and other data.
Collapse
Affiliation(s)
- Kai Chen
- grid.64924.3d0000 0004 1760 5735College of Software, Jilin University, Changchun, 130012 China ,grid.64924.3d0000 0004 1760 5735Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Xiaodong Zhu
- grid.64924.3d0000 0004 1760 5735College of Software, Jilin University, Changchun, 130012 China ,grid.64924.3d0000 0004 1760 5735Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China ,grid.64924.3d0000 0004 1760 5735College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Jiahao Wang
- grid.64924.3d0000 0004 1760 5735College of Software, Jilin University, Changchun, 130012 China ,grid.64924.3d0000 0004 1760 5735Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Lei Hao
- grid.64924.3d0000 0004 1760 5735College of Software, Jilin University, Changchun, 130012 China ,grid.64924.3d0000 0004 1760 5735Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Zhen Liu
- grid.64924.3d0000 0004 1760 5735College of Computer Science and Technology, Jilin University, Changchun, 130012 China ,grid.444367.60000 0000 9853 5396Graduate School of Engineering, Nagasaki Institute of Applied Science, 536 Aba-machi, Nagasaki 851-0193 Japan
| | - Yuanning Liu
- College of Software, Jilin University, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China. .,College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
6
|
Wang H, Lu X, Zheng H, Wang W, Zhang G, Wang S, Lin P, Zhuang Y, Chen C, Chen Q, Qu J, Xu L. RNAsmc: A integrated tool for comparing RNA secondary structure and evaluating allosteric effects. Comput Struct Biotechnol J 2023; 21:965-973. [PMID: 36733704 PMCID: PMC9876829 DOI: 10.1016/j.csbj.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/06/2023] [Accepted: 01/07/2023] [Indexed: 01/11/2023] Open
Abstract
RNA structure plays a crucial role in gene regulation, in RNA stability and the essential biological processes. RNA secondary structure (RSS) motifs are the basic building blocks for investigating the biological mechanisms of structure. Here, we present a strategy for structural motif-based dynamic alignment, namely, RNA secondary-structural motif-comparing (RNAsmc), to identify structural motifs and quantitatively evaluate their underlying molecular functions. RNAsmc also has strong robustness to sequence length, folding protocol and RNA structural profile by chemical probing. Notably, it is also applicable to quantify structural variation in special RNA editing events (SNVs or SNPs, fragment insertion or deletion, etc.). The findings indicate that RNAsmc can uncover the heterogeneity of RNA secondary structure and score for similarities among components, which provides an impetus to cluster RNA families and evaluate allosteric effects. We find that RNAsmc exhibits remarkable detection efficiency for experimentally-derived RiboSNitches. Finally, the pipeline was assembled into an R software package to serve as an automated toolkit to explore, align, and cluster RSS. It is freely available for download at https://CRAN.R-project.org/package=RNAsmc.
Collapse
Affiliation(s)
- Hong Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
| | - Xiaoyan Lu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Hewei Zheng
- Wekemo Tech Group Co., Ltd. Shenzhen 518000, China
| | - Wencan Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Wenzhou Realdata Medical Research Co., Ltd, Wenzhou 325027, China
| | - Guosi Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Siyu Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Peng Lin
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Youyuan Zhuang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Chong Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Qi Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Jia Qu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Liangde Xu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| |
Collapse
|
7
|
Lima DDS, Amichi LJA, Fernandez MA, Constantino AA, Seixas FAV. NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:557-565. [PMID: 34826297 DOI: 10.1109/tcbb.2021.3131136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred (Non-Coding/Y RNA Prediction), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server (https://www.gpea.uem.br/ncypred/).
Collapse
|
8
|
Affiliation(s)
- Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
9
|
Wang L, Zhong X, Wang S, Liu Y. ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet. BMC Bioinformatics 2021; 22:447. [PMID: 34544356 PMCID: PMC8451086 DOI: 10.1186/s12859-021-04365-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 09/01/2021] [Indexed: 12/20/2022] Open
Abstract
Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.
Collapse
Affiliation(s)
- Linyu Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xiaodan Zhong
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shuo Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
| |
Collapse
|
10
|
Chantsalnyam T, Siraj A, Tayara H, Chong KT. ncRDense: A novel computational approach for classification of non-coding RNA family by deep learning. Genomics 2021; 113:3030-3038. [PMID: 34242708 DOI: 10.1016/j.ygeno.2021.07.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 06/29/2021] [Accepted: 07/03/2021] [Indexed: 12/14/2022]
Abstract
With the rapidly growing importance of biological research, non-coding RNAs (ncRNA) attract more attention in biology and bioinformatics. They play vital roles in biological processes such as transcription and translation. Classification of ncRNAs is essential to our understanding of disease mechanisms and treatment design. Many approaches to ncRNA classification have been developed, several of which use machine learning and deep learning. In this paper, we construct a novel deep learning-based architecture, ncRDense, to effectively classify and distinguish ncRNA families. In a comparative study, our model produces comparable results with existing state-of-the-art methods. Finally, we built a freely accessible web server for the ncRDense tool, which is available at http://nsclbio.jbnu.ac.kr/tools/ncRDense/.
Collapse
Affiliation(s)
- Tuvshinbayar Chantsalnyam
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Arslan Siraj
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
11
|
Binzel DW, Li X, Burns N, Khan E, Lee WJ, Chen LC, Ellipilli S, Miles W, Ho YS, Guo P. Thermostability, Tunability, and Tenacity of RNA as Rubbery Anionic Polymeric Materials in Nanotechnology and Nanomedicine-Specific Cancer Targeting with Undetectable Toxicity. Chem Rev 2021; 121:7398-7467. [PMID: 34038115 DOI: 10.1021/acs.chemrev.1c00009] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
RNA nanotechnology is the bottom-up self-assembly of nanometer-scale architectures, resembling LEGOs, composed mainly of RNA. The ideal building material should be (1) versatile and controllable in shape and stoichiometry, (2) spontaneously self-assemble, and (3) thermodynamically, chemically, and enzymatically stable with a long shelf life. RNA building blocks exhibit each of the above. RNA is a polynucleic acid, making it a polymer, and its negative-charge prevents nonspecific binding to negatively charged cell membranes. The thermostability makes it suitable for logic gates, resistive memory, sensor set-ups, and NEM devices. RNA can be designed and manipulated with a level of simplicity of DNA while displaying versatile structure and enzyme activity of proteins. RNA can fold into single-stranded loops or bulges to serve as mounting dovetails for intermolecular or domain interactions without external linking dowels. RNA nanoparticles display rubber- and amoeba-like properties and are stretchable and shrinkable through multiple repeats, leading to enhanced tumor targeting and fast renal excretion to reduce toxicities. It was predicted in 2014 that RNA would be the third milestone in pharmaceutical drug development. The recent approval of several RNA drugs and COVID-19 mRNA vaccines by FDA suggests that this milestone is being realized. Here, we review the unique properties of RNA nanotechnology, summarize its recent advancements, describe its distinct attributes inside or outside the body and discuss potential applications in nanotechnology, medicine, and material science.
Collapse
Affiliation(s)
- Daniel W Binzel
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Xin Li
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Nicolas Burns
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Eshan Khan
- Department of Cancer Biology and Genetics, The Ohio State University Comprehensive Cancer Center, College of Medicine, Center for RNA Biology, The Ohio State University, Columbus, Ohio 43210, United States
| | - Wen-Jui Lee
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Li-Ching Chen
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Satheesh Ellipilli
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| | - Wayne Miles
- Department of Cancer Biology and Genetics, The Ohio State University Comprehensive Cancer Center, College of Medicine, Center for RNA Biology, The Ohio State University, Columbus, Ohio 43210, United States
| | - Yuan Soon Ho
- TMU Research Center of Cancer Translational Medicine, School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Graduate Institute of Medical Sciences, College of Medicine, Taipei Medical University, Department of Laboratory Medicine, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Peixuan Guo
- Center for RNA Nanobiotechnology and Nanomedicine, College of Pharmacy, Dorothy M. Davis Heart and Lung Research Institute, James Comprehensive Cancer Center, College of Medicine, The Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
12
|
Singh D, Madhawan A, Roy J. Identification of multiple RNAs using feature fusion. Brief Bioinform 2021; 22:6272794. [PMID: 33971667 DOI: 10.1093/bib/bbab178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs .
Collapse
Affiliation(s)
- Dalwinder Singh
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Akansha Madhawan
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Joy Roy
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| |
Collapse
|
13
|
Wang L, Zheng S, Zhang H, Qiu Z, Zhong X, Liuliu H, Liu Y. ncRFP: A Novel end-to-end Method for Non-Coding RNAs Family Prediction Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:784-789. [PMID: 32224462 DOI: 10.1109/tcbb.2020.2982873] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Evidence has accumulated enough to prove non-coding RNAs (ncRNAs) play important roles in cellular biological processes and disease pathogenesis. High throughput techniques have produced a large number of ncRNAs whose function remains unknown. Since the accurate identification of ncRNAs family is helpful to the research of their function, it is of necessity and urgency to predict the family of each ncRNAs. Although several traditional excellent methods are applicable to predict the family of ncRNAs, their complex procedures or inaccurate performance remain major problems confronting us. The main idea of those methods is first to predict the secondary structure, and then identify ncRNAs family according to properties of the secondary structure. Unfortunately, the multi-step error superposition, especially the imperfection of RNA secondary structure prediction tools, maybe the cause of low accuracy. In this paper, a novel end-to-end method 'ncRFP' was proposed to complete the prediction task based on Deep Learning. Instead of predicting the secondary structure, ncRFP predicts the ncRNAs family by automatically extracting features from ncRNAs sequences. Compared with other methods, ncRFP not only simplifies the process but also improves accuracy. The source code of ncRFP can be available at https://github.com/linyuwangPHD/ncRFP.
Collapse
|
14
|
Li Y, Zhang Q, Liu Z, Wang C, Han S, Ma Q, Du W. Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations. Brief Bioinform 2020; 22:6046058. [PMID: 33367506 PMCID: PMC8294561 DOI: 10.1093/bib/bbaa354] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 11/02/2020] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.
Collapse
Affiliation(s)
- Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Qi Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Zhaoqian Liu
- School of Mathematics, Shandong University, and now she is a visiting scholar at Ohio State University
| | | | - Siyu Han
- Department of Computer Science, Faculty of Engineering, University of Bristol
| | - Qin Ma
- Department of Biomedical Informatics, Ohio State University
| | - Wei Du
- College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
15
|
Li J, Zhang X, Liu C. The computational approaches of lncRNA identification based on coding potential: Status quo and challenges. Comput Struct Biotechnol J 2020; 18:3666-3677. [PMID: 33304463 PMCID: PMC7710504 DOI: 10.1016/j.csbj.2020.11.030] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/15/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| |
Collapse
|
16
|
Noviello TMR, Ceccarelli F, Ceccarelli M, Cerulo L. Deep learning predicts short non-coding RNA functions from only raw sequence data. PLoS Comput Biol 2020; 16:e1008415. [PMID: 33175836 PMCID: PMC7682815 DOI: 10.1371/journal.pcbi.1008415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 11/23/2020] [Accepted: 09/28/2020] [Indexed: 12/31/2022] Open
Abstract
Small non-coding RNAs (ncRNAs) are short non-coding sequences involved in gene regulation in many biological processes and diseases. The lack of a complete comprehension of their biological functionality, especially in a genome-wide scenario, has demanded new computational approaches to annotate their roles. It is widely known that secondary structure is determinant to know RNA function and machine learning based approaches have been successfully proven to predict RNA function from secondary structure information. Here we show that RNA function can be predicted with good accuracy from a lightweight representation of sequence information without the necessity of computing secondary structure features which is computationally expensive. This finding appears to go against the dogma of secondary structure being a key determinant of function in RNA. Compared to recent secondary structure based methods, the proposed solution is more robust to sequence boundary noise and reduces drastically the computational cost allowing for large data volume annotations. Scripts and datasets to reproduce the results of experiments proposed in this study are available at: https://github.com/bioinformatics-sannio/ncrna-deep.
Collapse
Affiliation(s)
- Teresa Maria Rosaria Noviello
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Napoli, Italy
- Biogem Scarl, Istituto di Ricerche Genetiche “Gaetano Salvatore”, Ariano Irpino, Italy
| | - Francesco Ceccarelli
- CaReBios srl, Ariano Irpino, Italy
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Michele Ceccarelli
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Napoli, Italy
- CaReBios srl, Ariano Irpino, Italy
| | - Luigi Cerulo
- Biogem Scarl, Istituto di Ricerche Genetiche “Gaetano Salvatore”, Ariano Irpino, Italy
- Department of Science and Technology, University of Sannio, Benevento, Italy
| |
Collapse
|
17
|
Chantsalnyam T, Lim DY, Tayara H, Chong KT. ncRDeep: Non-coding RNA classification with convolutional neural network. Comput Biol Chem 2020; 88:107364. [DOI: 10.1016/j.compbiolchem.2020.107364] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 08/04/2020] [Accepted: 08/18/2020] [Indexed: 12/21/2022]
|
18
|
Amin N, McGrath A, Chen YPP. Evaluation of deep learning in non-coding RNA classification. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0051-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
19
|
Ma Y, Yu Z, Han G, Li J, Anh V. Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs. BMC Bioinformatics 2018; 19:521. [PMID: 30598066 PMCID: PMC6311913 DOI: 10.1186/s12859-018-2518-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred .
Collapse
Affiliation(s)
- Yuanlin Ma
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Electrical Engineering and Computer Science, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| | - Guosheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering & IT, University of Technology Sydney, P.O Box 123, Broadway, NSW 2007 Australia
| | - Vo Anh
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| |
Collapse
|
20
|
Classification of riboswitch sequences using k-mer frequencies. Biosystems 2018; 174:63-76. [PMID: 30205141 DOI: 10.1016/j.biosystems.2018.09.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/16/2018] [Accepted: 09/05/2018] [Indexed: 12/24/2022]
Abstract
Riboswitches are non-coding RNAs that regulate gene expression by altering the structural conformation of mRNA transcripts. Their regulation mechanism might be exploited for interesting biomedical applications such as drug targets and biosensors. A major challenge consists in accurately identifying metabolite-binding RNA switches which are structurally complex and diverse. In this regard, we investigated the classification of 16 riboswitch families using supervised learning algorithms trained solely with sequence-based features. We generated a reduced feature set and proposed a visual representation to explore its components. We induced Support Vector Machine, Random Forest, Naive Bayes, J48, and HyperPipes classifiers with our proposed feature set and tested their performance over independent data. Our best multi-class classifier achieved F-measure values of 0.996 and 0.966 in the training and test phases, respectively, outperforming those of a previous approach. When compared against BLAST, our best classifiers yielded competitive results. This work shows that the classifiers trained with our sequence-based feature set accurately discriminate riboswitches.
Collapse
|
21
|
Navarin N, Costa F. An efficient graph kernel method for non-coding RNA functional prediction. Bioinformatics 2018; 33:2642-2650. [PMID: 28475710 DOI: 10.1093/bioinformatics/btx295] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 05/04/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation The importance of RNA protein-coding gene regulation is by now well appreciated. Non-coding RNAs (ncRNAs) are known to regulate gene expression at practically every stage, ranging from chromatin packaging to mRNA translation. However the functional characterization of specific instances remains a challenging task in genome scale settings. For this reason, automatic annotation approaches are of interest. Existing computational methods are either efficient but non-accurate or they offer increased precision, but present scalability problems. Results In this article, we present a predictive system based on kernel methods, a type of machine learning algorithm grounded in statistical learning theory. We employ a flexible graph encoding to preserve multiple structural hypotheses and exploit recent advances in representation and model induction to scale to large data volumes. Experimental results on tens of thousands of ncRNA sequences available from the Rfam database indicate that we can not only improve upon state-of-the-art predictors, but also achieve speedups of several orders of magnitude. Availability and implementation The code is available from http://www.bioinf.uni-freiburg.de/~costa/EDeN.tgz . Contact f.costa@exeter.ac.uk.
Collapse
Affiliation(s)
- Nicolò Navarin
- Department of Mathematics, University of Padova, Padova 35121, Italy
| | - Fabrizio Costa
- Department of Computer Science, University of Freiburg, D-79110 Freiburg, Germany.,Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
| |
Collapse
|
22
|
Simopoulos CMA, Weretilnyk EA, Golding GB. Prediction of plant lncRNA by ensemble machine learning classifiers. BMC Genomics 2018; 19:316. [PMID: 29720103 PMCID: PMC5930664 DOI: 10.1186/s12864-018-4665-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 04/12/2018] [Indexed: 02/06/2023] Open
Abstract
Background In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Results Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. Conclusions This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function. Electronic supplementary material The online version of this article (10.1186/s12864-018-4665-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main Street West, Hamilton, Canada.
| |
Collapse
|
23
|
Fiannaca A, La Rosa M, La Paglia L, Rizzo R, Urso A. nRC: non-coding RNA Classifier based on structural features. BioData Min 2017; 10:27. [PMID: 28785313 PMCID: PMC5540506 DOI: 10.1186/s13040-017-0148-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/24/2017] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. RESULTS We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. CONCLUSION The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
Collapse
Affiliation(s)
- Antonino Fiannaca
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy
| | - Massimo La Rosa
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy
| | - Laura La Paglia
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy
| | - Riccardo Rizzo
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy
| | - Alfonso Urso
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy
| |
Collapse
|
24
|
Shabash B, Wiese KC. RNA Visualization: Relevance and the Current State-of-the-Art Focusing on Pseudoknots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:696-712. [PMID: 26915129 DOI: 10.1109/tcbb.2016.2522421] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA visualization is crucial in order to understand the relationship that exists between RNA structure and its function, as well as the development of better RNA structure prediction algorithms. However, in the context of RNA visualization, one key structure remains difficult to visualize: Pseudoknots. Pseudoknots occur in RNA folding when two secondary structural components form base-pairs between them. The three-dimensional nature of these components makes them challenging to visualize in two-dimensional media, such as print media or screens. In this review, we focus on the advancements that have been made in the field of RNA visualization in two-dimensional media in the past two decades. The review aims at presenting all relevant aspects of pseudoknot visualization. We start with an overview of several pseudoknotted structures and their relevance in RNA function. Next, we discuss the theoretical basis for RNA structural topology classification and present RNA classification systems for both pseudoknotted and non-pseudoknotted RNAs. Each description of RNA classification system is followed by a discussion of the software tools and algorithms developed to date to visualize RNA, comparing the different tools' strengths and shortcomings.
Collapse
|
25
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
26
|
Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination. BIOMED RESEARCH INTERNATIONAL 2016; 2016:8496165. [PMID: 28042575 PMCID: PMC5153550 DOI: 10.1155/2016/8496165] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 10/05/2016] [Accepted: 10/13/2016] [Indexed: 12/27/2022]
Abstract
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.
Collapse
|
27
|
Reducing the bottleneck of graph-based data mining by improving the efficiency of labeled graph isomorphism testing. DATA KNOWL ENG 2014. [DOI: 10.1016/j.datak.2014.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
28
|
Panwar B, Arora A, Raghava GPS. Prediction and classification of ncRNAs using structural information. BMC Genomics 2014; 15:127. [PMID: 24521294 PMCID: PMC3925371 DOI: 10.1186/1471-2164-15-127] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 02/04/2014] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts. RESULTS In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models. CONCLUSIONS This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon).
Collapse
Affiliation(s)
- Bharat Panwar
- Bioinformatics Centre, Institute of Microbial Technology (CSIR), Sector 39A, Chandigarh, India
| | - Amit Arora
- Bioinformatics Centre, Institute of Microbial Technology (CSIR), Sector 39A, Chandigarh, India
| | - Gajendra PS Raghava
- Bioinformatics Centre, Institute of Microbial Technology (CSIR), Sector 39A, Chandigarh, India
| |
Collapse
|
29
|
Functional Annotation of Small Noncoding RNAs Target Genes Provides Evidence for a Deregulated Ubiquitin-Proteasome Pathway in Spinocerebellar Ataxia Type 1. J Nucleic Acids 2012; 2012:672536. [PMID: 23094141 PMCID: PMC3471453 DOI: 10.1155/2012/672536] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 07/30/2012] [Indexed: 01/30/2023] Open
Abstract
Spinocerebellar ataxia type 1 (SCA1) is a neurodegenerative disorder caused by the expansion of CAG repeats in the ataxin 1 (ATXN1) gene. In affected cerebellar neurons of patients, mutant ATXN1 accumulates in ubiquitin-positive nuclear inclusions, indicating that protein misfolding is involved in SCA1 pathogenesis. In this study, we functionally annotated the target genes of the small noncoding RNAs (ncRNAs) that were selectively activated in the affected brain compartments. The primary targets of these RNAs, which exhibited a significant enrichment in the cerebellum and cortex of SCA1 patients, were members of the ubiquitin-proteasome system. Thus, we identified and functionally annotated a plausible regulatory pathway that may serve as a potential target to modulate the outcome of neurodegenerative diseases.
Collapse
|
30
|
BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms. Adv Bioinformatics 2012; 2012:893048. [PMID: 22675348 PMCID: PMC3366197 DOI: 10.1155/2012/893048] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 02/22/2012] [Indexed: 11/23/2022] Open
Abstract
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
Collapse
|
31
|
Golbabapour S, Abdulla MA, Hajrezaei M. A concise review on epigenetic regulation: insight into molecular mechanisms. Int J Mol Sci 2011; 12:8661-94. [PMID: 22272098 PMCID: PMC3257095 DOI: 10.3390/ijms12128661] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Revised: 11/07/2011] [Accepted: 11/10/2011] [Indexed: 12/17/2022] Open
Abstract
Epigenetic mechanisms are responsible for the regulation of transcription of imprinted genes and those that induce a totipotent state. Starting just after fertilization, DNA methylation pattern undergoes establishment, reestablishment and maintenance. These modifications are important for normal embryo and placental developments. Throughout life and passing to the next generation, epigenetic events establish, maintain, erase and reestablish. In the context of differentiated cell reprogramming, demethylation and activation of genes whose expressions contribute to the pluripotent state is the crux of the matter. In this review, firstly, regulatory epigenetic mechanisms related to somatic cell nuclear transfer (SCNT) reprogramming are discussed, followed by embryonic development, and placental epigenetic issues.
Collapse
Affiliation(s)
- Shahram Golbabapour
- Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia; E-Mails: (M.A.A.); (M.H.)
| | - Mahmood Ameen Abdulla
- Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia; E-Mails: (M.A.A.); (M.H.)
| | - Maryam Hajrezaei
- Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia; E-Mails: (M.A.A.); (M.H.)
| |
Collapse
|
32
|
Laing C, Schlick T. Computational approaches to RNA structure prediction, analysis, and design. Curr Opin Struct Biol 2011; 21:306-18. [PMID: 21514143 PMCID: PMC3112238 DOI: 10.1016/j.sbi.2011.03.015] [Citation(s) in RCA: 121] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Revised: 03/24/2011] [Accepted: 03/29/2011] [Indexed: 12/19/2022]
Abstract
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires RNA tertiary structure knowledge. Although modeling approaches for the study of RNA structures and dynamics lag behind efforts in protein folding, much progress has been achieved in the past two years. Here, we review recent advances in RNA folding algorithms, RNA tertiary motif discovery, applications of graph theory approaches to RNA structure and function, and in silico generation of RNA sequence pools for aptamer design. Advances within each area can be combined to impact many problems in RNA structure and function.
Collapse
Affiliation(s)
- Christian Laing
- Department of Chemistry, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Tamar Schlick
- Department of Chemistry, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| |
Collapse
|
33
|
Xiao J, Tang X, Li Y, Fang Z, Ma D, He Y, Li M. Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure. BMC Bioinformatics 2011; 12:165. [PMID: 21575268 PMCID: PMC3118167 DOI: 10.1186/1471-2105-12-165] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 05/17/2011] [Indexed: 11/17/2022] Open
Abstract
Background MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately. Results In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively. Conclusions Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA.
Collapse
Affiliation(s)
- Jiamin Xiao
- College of Chemistry and State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | |
Collapse
|
34
|
Herbig A, Nieselt K. nocoRNAc: characterization of non-coding RNAs in prokaryotes. BMC Bioinformatics 2011; 12:40. [PMID: 21281482 PMCID: PMC3230914 DOI: 10.1186/1471-2105-12-40] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not. Results We present NOCORNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. NOCORNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and NOCORNAc to the genome of Streptomyces coelicolor and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner. Conclusions We have developed NOCORNAc, a framework that facilitates the automated characterization of functional ncRNAs. NOCORNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. NOCORNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at http://www.zbit.uni-tuebingen.de/pas/nocornac.htm.
Collapse
Affiliation(s)
- Alexander Herbig
- Center for Bioinformatics Tübingen, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | | |
Collapse
|
35
|
Toledo-Arana A, Solano C. Deciphering the physiological blueprint of a bacterial cell: revelations of unanticipated complexity in transcriptome and proteome. Bioessays 2010; 32:461-7. [PMID: 20486131 DOI: 10.1002/bies.201000020] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
During the last few months, several pioneer genome-wide transcriptomic, proteomic and metabolomic studies have revolutionised the understanding of bacterial biological processes, leading to a picture that resembles eukaryotic complexity. Technological advances such as next-generation high-throughput sequencing and high-density oligonucleotide microarrays have allowed the determination, in several bacteria, of the entire boundaries of all expressed transcripts. Consequently, novel RNA-mediated regulatory mechanisms have been discovered including multifunctional RNAs. Moreover, resolution of bacterial proteome organisation (interactome) and global protein localisation (localizome) have unveiled an unanticipated complexity that highlights the significance of protein multifunctionality and localisation in the cell. Also, analysis of a complete bacterial metabolic network has again revealed a high fraction of multifunctional enzymes and an unexpectedly high level of metabolic responses and adaptation. Altogether, these novel approaches have permitted the deciphering of the entire physiological landscape of one of the smallest bacteria, Mycoplasma pneumoniae. Here, we summarise and discuss recent findings aimed at defining the blueprint of any prokaryote.
Collapse
Affiliation(s)
- Alejandro Toledo-Arana
- Laboratory of Microbial Biofilms, Instituto de Agrobiotecnología, Universidad Pública de Navarra-CSIC-Gobierno de Navarra, Campus de Arrosadía, Pamplona, Spain.
| | | |
Collapse
|
36
|
Schudoma C, May P, Nikiforova V, Walther D. Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling. Nucleic Acids Res 2009; 38:970-80. [PMID: 19923230 PMCID: PMC2817452 DOI: 10.1093/nar/gkp1010] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence-structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.
Collapse
Affiliation(s)
- Christian Schudoma
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | | | | | | |
Collapse
|