1
|
Yao HB, Hou ZJ, Zhang WG, Li H, Chen Y. Prediction of MicroRNA-Disease Potential Association Based on Sparse Learning and Multilayer Random Walks. J Comput Biol 2024; 31:241-256. [PMID: 38377572 DOI: 10.1089/cmb.2023.0266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024] Open
Abstract
More and more studies have shown that microRNAs (miRNAs) play an indispensable role in the study of complex diseases in humans. Traditional biological experiments to detect miRNA-disease associations are expensive and time-consuming. Therefore, it is necessary to propose efficient and meaningful computational models to predict miRNA-disease associations. In this study, we aim to propose a miRNA-disease association prediction model based on sparse learning and multilayer random walks (SLMRWMDA). The miRNA-disease association matrix is decomposed and reconstructed by the sparse learning method to obtain richer association information, and at the same time, the initial probability matrix for the random walk with restart algorithm is obtained. The disease similarity network, miRNA similarity network, and miRNA-disease association network are used to construct heterogeneous networks, and the stable probability is obtained based on the topological structure features of diseases and miRNAs through a multilayer random walk algorithm to predict miRNA-disease potential association. The experimental results show that the prediction accuracy of this model is significantly improved compared with the previous related models. We evaluated the model using global leave-one-out cross-validation (global LOOCV) and fivefold cross-validation (5-fold CV). The area under the curve (AUC) value for the LOOCV is 0.9368. The mean AUC value for 5-fold CV is 0.9335 and the variance is 0.0004. In the case study, the results show that SLMRWMDA is effective in inferring the potential association of miRNA-disease.
Collapse
Affiliation(s)
- Hai-Bin Yao
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Zhen-Jie Hou
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Wen-Guang Zhang
- Life Sciences, Inner Mongolia Agricultural University, Hohhot, China
| | - Han Li
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Yan Chen
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| |
Collapse
|
2
|
Nath A, Bora U. RNAinsecta: A tool for prediction of precursor microRNA in insects and search for their target in the model organism Drosophila melanogaster. PLoS One 2023; 18:e0287323. [PMID: 37812647 PMCID: PMC10561860 DOI: 10.1371/journal.pone.0287323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/03/2023] [Indexed: 10/11/2023] Open
Abstract
INTRODUCTION AND BACKGROUND Pre-MicroRNAs are the hairpin loops from which microRNAs are produced that have been found to negatively regulate gene expression in several organisms. In insects, microRNAs participate in several biological processes including metamorphosis, reproduction, immune response, etc. Numerous tools have been designed in recent years to predict novel pre-microRNA using binary machine learning classifiers where prediction models are trained with true and pseudo pre-microRNA hairpin loops. Currently, there are no existing tool that is exclusively designed for insect pre-microRNA detection. AIM Application of machine learning algorithms to develop an open source tool for prediction of novel precursor microRNA in insects and search for their miRNA targets in the model insect organism, Drosophila melanogaster. METHODS Machine learning algorithms such as Random Forest, Support Vector Machine, Logistic Regression and K-Nearest Neighbours were used to train insect true and false pre-microRNA features with 10-fold Cross Validation on SMOTE and Near-Miss datasets. miRNA targets IDs were collected from miRTarbase and their corresponding transcripts were collected from FlyBase. We used miRanda algorithm for the target searching. RESULTS In our experiment, SMOTE performed significantly better than Near-Miss for which it was used for modelling. We kept the best performing parameters after obtaining initial mean accuracy scores >90% of Cross Validation. The trained models on Support Vector Machine achieved accuracy of 92.19% while the Random Forest attained an accuracy of 80.28% on our validation dataset. These models are hosted online as web application called RNAinsecta. Further, searching target for the predicted pre-microRNA in Drosophila melanogaster has been provided in RNAinsecta.
Collapse
Affiliation(s)
- Adhiraj Nath
- Department of BSBE, IIT Guwahati, North Guwahati, Assam, India
| | - Utpal Bora
- Department of BSBE, IIT Guwahati, North Guwahati, Assam, India
| |
Collapse
|
3
|
Loganathan T, Doss C GP. Non-coding RNAs in human health and disease: potential function as biomarkers and therapeutic targets. Funct Integr Genomics 2023; 23:33. [PMID: 36625940 PMCID: PMC9838419 DOI: 10.1007/s10142-022-00947-4] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 01/11/2023]
Abstract
Human diseases have been a critical threat from the beginning of human history. Knowing the origin, course of action and treatment of any disease state is essential. A microscopic approach to the molecular field is a more coherent and accurate way to explore the mechanism, progression, and therapy with the introduction and evolution of technology than a macroscopic approach. Non-coding RNAs (ncRNAs) play increasingly important roles in detecting, developing, and treating all abnormalities related to physiology, pathology, genetics, epigenetics, cancer, and developmental diseases. Noncoding RNAs are becoming increasingly crucial as powerful, multipurpose regulators of all biological processes. Parallel to this, a rising amount of scientific information has revealed links between abnormal noncoding RNA expression and human disorders. Numerous non-coding transcripts with unknown functions have been found in addition to advancements in RNA-sequencing methods. Non-coding linear RNAs come in a variety of forms, including circular RNAs with a continuous closed loop (circRNA), long non-coding RNAs (lncRNA), and microRNAs (miRNA). This comprises specific information on their biogenesis, mode of action, physiological function, and significance concerning disease (such as cancer or cardiovascular diseases and others). This study review focuses on non-coding RNA as specific biomarkers and novel therapeutic targets.
Collapse
Affiliation(s)
- Tamizhini Loganathan
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology (VIT), Vellore- 632014, Tamil Nadu, India
| | - George Priya Doss C
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology (VIT), Vellore- 632014, Tamil Nadu, India.
| |
Collapse
|
4
|
Stegmayer G, Di Persia LE, Rubiolo M, Gerard M, Pividori M, Yones C, Bugnon LA, Rodriguez T, Raad J, Milone DH. Predicting novel microRNA: a comprehensive comparison of machine learning approaches. Brief Bioinform 2020; 20:1607-1620. [PMID: 29800232 DOI: 10.1093/bib/bby037] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 03/26/2018] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. RESULTS This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.
Collapse
Affiliation(s)
- Georgina Stegmayer
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro E Di Persia
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Mariano Rubiolo
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Matias Gerard
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Milton Pividori
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Leandro A Bugnon
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Tadeo Rodriguez
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Jonathan Raad
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- sinc(i), Research Institute for Signals, Systems and Computational Intelligence (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
5
|
Larue CT, Ream JE, Zhou X, Moshiri F, Howe A, Goley M, Sparks OC, Voss ST, Hall E, Ellis C, Weihe J, Qi Q, Ribeiro D, Wei X, Guo S, Evdokimov AG, Varagona MJ, Roberts JK. Microbial HemG-type protoporphyrinogen IX oxidase enzymes for biotechnology applications in plant herbicide tolerance traits. PEST MANAGEMENT SCIENCE 2020; 76:1031-1038. [PMID: 31503398 DOI: 10.1002/ps.5613] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 08/24/2019] [Accepted: 09/06/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Protoporphyrinogen IX oxidase (PPO)-inhibiting herbicides act by inhibiting a key enzyme in the heme and chlorophyll biosynthetic pathways in plants. This enzyme, the PPO enzyme, is conserved across plant species. However, some microbes are known to utilize a unique family of PPO enzymes, the HemG family. This enzyme family carries out the same enzymatic step as the plant PPO enzymes, but does not share sequence homology with the plant PPO enzymes. RESULTS Bioinformatic analysis was used to identify putative HemG PPO enzyme variants from microbial sources. A subset of these variants was cloned and characterized. HemG PPO variants were characterized for functionality and tolerance to PPO-inhibiting herbicides. HemG PPO variants that exhibited insensitivity to PPO-inhibiting herbicides were identified for further characterization. Expression of selected variants in maize, soybean, cotton and canola resulted in plants that displayed tolerance to applications of PPO-inhibiting herbicides. CONCLUSION Selected microbial-sourced HemG PPO enzyme variants present an opportunity for building new herbicide tolerance biotechnology traits. These traits provide tolerance to PPO-inhibiting herbicides and, therefore, could provide additional tools for farmers to employ in their weed management systems. © 2019 Society of Chemical Industry.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Erin Hall
- Bayer Crop Science, Chesterfield, MO, USA
| | | | | | - Qungang Qi
- Bayer Crop Science, Chesterfield, MO, USA
| | | | | | | | | | | | | |
Collapse
|
6
|
Bugnon LA, Yones C, Raad J, Milone DH, Stegmayer G. Genome-wide hairpins datasets of animals and plants for novel miRNA prediction. Data Brief 2019; 25:104209. [PMID: 31453279 PMCID: PMC6700487 DOI: 10.1016/j.dib.2019.104209] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/16/2019] [Accepted: 06/25/2019] [Indexed: 01/19/2023] Open
Abstract
This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
7
|
Lai WF, Lin M, Wong WT. Tackling Aging by Using miRNA as a Target and a Tool. Trends Mol Med 2019; 25:673-684. [PMID: 31126873 DOI: 10.1016/j.molmed.2019.04.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/12/2019] [Accepted: 04/17/2019] [Indexed: 12/15/2022]
Abstract
miRNA is a class of short noncoding RNA that regulates gene expression at the post-transcriptional level. Evidence of age-associated changes in miRNA expression has been collected in models ranging from nematodes to humans; however, there has been little discussion of how to turn our knowledge of miRNA biology into antiaging therapy. This opinion article provides a snapshot of our current understanding of the roles of miRNA in modulating the aging process. We discuss major chemical techniques for modifying the miRNA structure as well as developing delivery systems for intervention. Finally, technical needs to be met for bench-to-clinic translation of miRNA-based interventions are highlighted for future research.
Collapse
Affiliation(s)
- Wing-Fu Lai
- Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China; Health Science Centre, Shenzhen University, Shenzhen, China.
| | - Marie Lin
- Health Science Centre, Shenzhen University, Shenzhen, China
| | - Wing-Tak Wong
- Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong Special Administrative Region, China
| |
Collapse
|
8
|
Yones C, Stegmayer G, Milone DH. Genome-wide pre-miRNA discovery from few labeled examples. Bioinformatics 2018; 34:541-549. [PMID: 29028911 DOI: 10.1093/bioinformatics/btx612] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 09/22/2017] [Indexed: 12/16/2022] Open
Abstract
Motivation Although many machine learning techniques have been proposed for distinguishing miRNA hairpins from other stem-loop sequences, most of the current methods use supervised learning, which requires a very good set of positive and negative examples. Those methods have important practical limitations when they have to be applied to a real prediction task. First, there is the challenge of dealing with a scarce number of positive (well-known) pre-miRNA examples. Secondly, it is very difficult to build a good set of negative examples for representing the full spectrum of non-miRNA sequences. Thirdly, in any genome, there is a huge class imbalance (1: 10 000) that is well-known for particularly affecting supervised classifiers. Results To enable efficient and speedy genome-wide predictions of novel miRNAs, we present miRNAss, which is a novel method based on semi-supervised learning. It takes advantage of the information provided by the unlabeled stem-loops, thereby improving the prediction rates, even when the number of labeled examples is low and not representative of the classes. An automatic method for searching negative examples to initialize the algorithm is also proposed so as to spare the user this difficult task. MiRNAss obtained better prediction rates and shorter execution times than state-of-the-art supervised methods. It was validated with genome-wide data from three model species, with more than one million of hairpin sequences each, thereby demonstrating its applicability to a real prediction task. Availability and implementation An R package can be downloaded from https://cran.r-project.org/package=miRNAss. In addition, a web-demo for testing the algorithm is available at http://fich.unl.edu.ar/sinc/web-demo/mirnass. All the datasets that were used in this study and the sets of predicted pre-miRNA are available on http://sourceforge.net/projects/sourcesinc/files/mirnass. Contact cyones@sinc.unl.edu.ar. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- C Yones
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), UNL-CONICET. Ciudad Universitaria, 4to piso FICH, Santa Fe 3000, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), UNL-CONICET. Ciudad Universitaria, 4to piso FICH, Santa Fe 3000, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), UNL-CONICET. Ciudad Universitaria, 4to piso FICH, Santa Fe 3000, Argentina
| |
Collapse
|
9
|
Rorbach G, Unold O, Konopka BM. Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods. Sci Rep 2018; 8:7560. [PMID: 29765080 PMCID: PMC5953923 DOI: 10.1038/s41598-018-25578-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 04/13/2018] [Indexed: 12/13/2022] Open
Abstract
Mirtrons are non-canonical microRNAs encoded in introns the biogenesis of which starts with splicing. They are not processed by Drosha and enter the canonical pathway at the Exportin-5 level. Mirtrons are much less evolutionary conserved than canonical miRNAs. Due to the differences, canonical miRNA predictors are not applicable to mirtron prediction. Identification of differences is important for designing mirtron prediction algorithms and may help to improve the understanding of mirtron functioning. So far, only simple, single-feature comparisons were reported. These are insensitive to complex feature relations. We quantified miRNAs with 25 features and showed that it is impossible to distinguish the two miRNA species using simple thresholds on any single feature. However, when using the Principal Component Analysis mirtrons and canonical miRNAs are grouped separately. Moreover, several methodologically diverse machine learning classifiers delivered high classification performance. Using feature selection algorithms we found features (e.g. bulges in the stem region), previously reported divergent in two classes, that did not contribute to improving classification accuracy, which suggests that they are not biologically meaningful. Finally, we proposed a combination of the most important features (including Guanine content, hairpin free energy and hairpin length) which convey a specific pattern, crucial for identifying mirtrons.
Collapse
Affiliation(s)
- Grzegorz Rorbach
- Department of Computer Engineering, Faculty of Electronics, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - Olgierd Unold
- Department of Computer Engineering, Faculty of Electronics, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - Bogumil M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland.
| |
Collapse
|
10
|
Stegmayer G, Yones C, Kamenetzky L, Milone DH. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1316-1326. [PMID: 27295687 DOI: 10.1109/tcbb.2016.2576459] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.
Collapse
|
11
|
Saçar Demirci MD, Baumbach J, Allmer J. On the performance of pre-microRNA detection algorithms. Nat Commun 2017; 8:330. [PMID: 28839141 PMCID: PMC5571158 DOI: 10.1038/s41467-017-00403-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 06/23/2017] [Indexed: 01/31/2023] Open
Abstract
MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches using all relevant, published, and novel data sets while judging algorithm performance based on ten intrinsic performance measures. We present an extensible framework, izMiR, which allows for the unbiased comparison of existing algorithms, adding new ones, and combining multiple approaches into ensemble methods. In an exhaustive attempt, we condense the results of millions of computations and show that no method is clearly superior; however, we provide a guideline for biomedical researchers to select a tool. Finally, we demonstrate that combining all of the methods into one ensemble approach, for the first time, allows reliable purely computational pre-miRNA detection in large eukaryotic genomes.As the experimental discovery of microRNAs (miRNAs) is cumbersome, computational tools have been developed for the prediction of pre-miRNAs. Here the authors develop a framework to assess the performance of existing and novel pre-miRNA prediction tools and provide guidelines for selecting an appropriate approach for a given data set.
Collapse
Affiliation(s)
| | - Jan Baumbach
- Computational Systems Biology, Max Planck Institute for Informatics, 66123, Saarbrücken, Germany.
- Computational Biology, University of Southern Denmark, DK-5230, Odense M, Denmark.
| | - Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, 35430, Turkey
- Bionia Incorporated, IZTEKGEB A8, Urla, Izmir, 35430, Turkey
| |
Collapse
|
12
|
A Review on Recent Computational Methods for Predicting Noncoding RNAs. BIOMED RESEARCH INTERNATIONAL 2017; 2017:9139504. [PMID: 28553651 PMCID: PMC5434267 DOI: 10.1155/2017/9139504] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/06/2017] [Accepted: 02/15/2017] [Indexed: 12/20/2022]
Abstract
Noncoding RNAs (ncRNAs) play important roles in various cellular activities and diseases. In this paper, we presented a comprehensive review on computational methods for ncRNA prediction, which are generally grouped into four categories: (1) homology-based methods, that is, comparative methods involving evolutionarily conserved RNA sequences and structures, (2) de novo methods using RNA sequence and structure features, (3) transcriptional sequencing and assembling based methods, that is, methods designed for single and pair-ended reads generated from next-generation RNA sequencing, and (4) RNA family specific methods, for example, methods specific for microRNAs and long noncoding RNAs. In the end, we summarized the advantages and limitations of these methods and pointed out a few possible future directions for ncRNA prediction. In conclusion, many computational methods have been demonstrated to be effective in predicting ncRNAs for further experimental validation. They are critical in reducing the huge number of potential ncRNAs and pointing the community to high confidence candidates. In the future, high efficient mapping technology and more intrinsic sequence features (e.g., motif and k-mer frequencies) and structure features (e.g., minimum free energy, conserved stem-loop, or graph structures) are suggested to be combined with the next- and third-generation sequencing platforms to improve ncRNA prediction.
Collapse
|
13
|
Mugunga I, Ju Y, Liu X, Huang X. Computational prediction of human disease-related microRNAs by path-based random walk. Oncotarget 2017; 8:58526-58535. [PMID: 28938576 PMCID: PMC5601672 DOI: 10.18632/oncotarget.17226] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 03/22/2017] [Indexed: 01/09/2023] Open
Abstract
MicroRNAs (miRNAs) are a class of small, endogenous RNAs that are 21–25 nucleotides in length. In animals and plants, miRNAs target specific genes for degradation or translation repression. Discovering disease-related miRNA is fundamental for understanding the pathogenesis of diseases. The association between miRNA and a disease is mainly determined via biological investigation, which is complicated by increased biological information due to big data from different databases. Researchers have utilized different computational methods to harmonize experimental approaches to discover miRNA that articulates restrictively in specific environmental situations. In this work, we present a prediction model that is based on the theory of path features and random walk to obtain a relevancy score of miRNA-related disease. In this model, highly ranked scores are potential miRNA-disease associations. Features were extracted from positive and negative samples of miRNA-disease association. Then, we compared our method with other presented models using the five-fold cross-validation method, which obtained an area under the receiver operating characteristic curve of 88.6%. This indicated that our method has a better performance compared to previous methods and will help future biological investigations.
Collapse
Affiliation(s)
- Israel Mugunga
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Ying Ju
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Xiaoyang Huang
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
14
|
Computational Approaches and Related Tools to Identify MicroRNAs in a Species: A Bird’s Eye View. Interdiscip Sci 2017; 10:616-635. [DOI: 10.1007/s12539-017-0223-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 12/20/2016] [Accepted: 03/09/2017] [Indexed: 12/26/2022]
|
15
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
16
|
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data. Genomics 2016; 107:274-80. [DOI: 10.1016/j.ygeno.2016.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 04/06/2016] [Accepted: 04/18/2016] [Indexed: 11/17/2022]
|
17
|
Peace RJ, Biggar KK, Storey KB, Green JR. A framework for improving microRNA prediction in non-human genomes. Nucleic Acids Res 2015; 43:e138. [PMID: 26163062 PMCID: PMC4787757 DOI: 10.1093/nar/gkv698] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 06/28/2015] [Indexed: 11/12/2022] Open
Abstract
The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.
Collapse
Affiliation(s)
- Robert J Peace
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| | - Kyle K Biggar
- Institute of Biochemistry and Department of Biology, Carleton University, Ottawa, Canada Department of Biochemistry, University of Western Ontario, London, Canada
| | - Kenneth B Storey
- Institute of Biochemistry and Department of Biology, Carleton University, Ottawa, Canada
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|
18
|
Tran VDT, Tempel S, Zerath B, Zehraoui F, Tahi F. miRBoost: boosting support vector machines for microRNA precursor classification. RNA (NEW YORK, N.Y.) 2015; 21:775-85. [PMID: 25795417 PMCID: PMC4408786 DOI: 10.1261/rna.043612.113] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 12/20/2014] [Indexed: 06/04/2023]
Abstract
Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr.
Collapse
Affiliation(s)
- Van Du T Tran
- IBISC - IBGBI, University of Evry, 91037 Evry CEDEX, France Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Sebastien Tempel
- IBISC - IBGBI, University of Evry, 91037 Evry CEDEX, France LCB, CNRS UMR 7283, 13009 Marseille, France
| | | | | | - Fariza Tahi
- IBISC - IBGBI, University of Evry, 91037 Evry CEDEX, France
| |
Collapse
|
19
|
Cohen A, Combes V, Grau GER. MicroRNAs and Malaria - A Dynamic Interaction Still Incompletely Understood. JOURNAL OF NEUROINFECTIOUS DISEASES 2015; 6:165. [PMID: 26005686 PMCID: PMC4441219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Malaria is a mosquito-borne infectious disease caused by parasitic protozoa of the genus Plasmodium. It remains a major problem affecting humans today, especially children. However, the pathogenesis of malaria, especially severe malaria, remains incompletely understood, hindering our ability to treat this disease. Of recent interest is the role that small, non-coding RNAs play in the progression, pathogenesis of, and resistance to, malaria. Independent studies have now revealed the presence of microRNA (miRNA) in the malaria parasite, vector, and host, though these studies are relatively few. Here, we review these studies, focusing on the roles specific miRNA have in the disease, and how they may be harnessed for therapeutic purposes.
Collapse
Affiliation(s)
| | | | - Georges ER Grau
- Corresponding author: Grau GER, Medical Foundation Building (K25), 92-94 Parramatta Rd, Camperdown NSW 2050, Australia, Tel: +61 2 9036 3260;
| |
Collapse
|
20
|
Karathanou K, Theofilatos K, Kleftogiannis D, Alexakos C, Likothanassis S, Tsakalidis A, Mavroudi S. ncRNAclass: A Web Platform for Non-Coding RNA Feature Calculation and MicroRNAs and Targets Prediction. INT J ARTIF INTELL T 2015. [DOI: 10.1142/s0218213015400023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
According to the central dogma of Biology it was commonly accepted that most of the genetic information was transacted by proteins. Recent experimental techniques revealed that the majority of mammalian genomes and other complex organisms are in fact transcribed into non-coding RNAs. Typically, non-coding RNAs are small nucleotide sequences that are not transcribed into proteins and have a profound regulatory role. Present advances in computational biosciences linked their abnormal functionality to many diseases and re-stated the principles of basic therapeutic strategies. The effective identification of non-coding RNAs and their biological role emerges as a new and challenging bioinformatics problem. ncRNAclass ( http://biotools.ceid.upatras.gr/ncrnaclass/ ) is a web platform that allows for efficient computation of a set of features that can describe effectively the broad class of non-coding RNAs. Moreover, it enables the calculation of features that include information about the targeting behavior of miRNAs. The tool operates under a user-friendly interface and its pilot implementation incorporates prediction models for the well-known class of microRNAs and for prediction their mRNA targets. The prediction models are based on two novel evolutionary Machine Learning algorithms that achieve very high classification performance in comparison with existing methods. The platform is also equipped with a data warehouse, with manually curated sequences, that enables fast information retrieval and data mining utilities.
Collapse
Affiliation(s)
| | | | | | - Christos Alexakos
- Department of Computer Engineering and Informatics, University of Patras, Greece
| | - Spiros Likothanassis
- Department of Computer Engineering and Informatics, University of Patras, Greece
| | | | - Seferina Mavroudi
- Department of Computer Engineering and Informatics, University of Patras, Greece
- Department of Social Work, School of Sciences of Health and Care Technological Educational Institute of Patras, Greece
| |
Collapse
|
21
|
ElGokhy SM, ElHefnawi M, Shoukry A. Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences. BMC Res Notes 2014; 7:286. [PMID: 24884968 PMCID: PMC4051165 DOI: 10.1186/1756-0500-7-286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Accepted: 04/22/2014] [Indexed: 01/23/2023] Open
Abstract
Background MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. Results The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. Conclusions The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
Collapse
Affiliation(s)
- Sherin M ElGokhy
- Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology (E-JUST), 21934, New Borg El-Arab, Alexandria, Egypt.
| | | | | |
Collapse
|
22
|
Wang C, Wei L, Guo M, Zou Q. Computational approaches in detecting non- coding RNA. Curr Genomics 2014; 14:371-7. [PMID: 24396270 PMCID: PMC3861888 DOI: 10.2174/13892029113149990005] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 07/18/2013] [Accepted: 07/18/2013] [Indexed: 12/21/2022] Open
Abstract
The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Leyi Wei
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Quan Zou
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| |
Collapse
|
23
|
Identification of differentially expressed genes in American cockroach ovaries and testes by suppression subtractive hybridization and the prediction of its miRNAs. Mol Genet Genomics 2013; 288:627-38. [PMID: 23996145 DOI: 10.1007/s00438-013-0777-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 08/19/2013] [Indexed: 10/26/2022]
Abstract
Studies on the cockroach have contributed to our understanding of several important developmental processes, especially those that can be easily studied in the embryo. However, our knowledge on late events such as gonad differentiation in the cockroach is still limited. The major aim of the present study was to identify sex-specific genes between adult female and male Periplaneta americana. Two cDNA libraries were constructed using the suppression subtractive hybridization method; a total of 433 and 599 unique sequences were obtained from the forward library and the reverse library, respectively, by cluster assembly, and sequence alignment of 1,032 expressed sequence tags. The analysis of the differentially expressed gene functions allowed these genes to be categorized into three groups: biological process, molecular function, and cellular component. The differentially expressed genes were suggested to be related to the development of the gonads of P. americana. Twelve differentially expressed genes were randomly selected and verified using relative quantitative real-time polymerase chain reaction (qRT-PCR). Meanwhile, by adopting a range of filtering criteria, we predicted two potential microRNA sequences for P. americana, pam-miR100-3p and pam-miR7. To confirm the expression of potential microRNAs (miRNAs) in American cockroach, a qRT-PCR approach was also employed. The data presented here offer the insights into the molecular foundation of sex differences in American cockroach, and the first report for the miRNAs in this species. In addition, the results can be used as a reference for unraveling candidate genes associated with the sex and reproduction of cockroaches.
Collapse
|
24
|
Wang Q, Wei L, Guan X, Wu Y, Zou Q, Ji Z. Briefing in family characteristics of microRNAs and their applications in cancer research. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:191-7. [PMID: 23954304 DOI: 10.1016/j.bbapap.2013.08.002] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Revised: 07/19/2013] [Accepted: 08/07/2013] [Indexed: 12/19/2022]
Abstract
MicroRNAs (miRNAs) are endogenous, short, non-coding RNA molecules that are directly involved in the post-transcriptional regulation of gene expression. Dysregulation of miRNAs is usually associated with diseases. Since miRNAs in a family intend to have common functional characteristics, proper assignment of miRNA family becomes heuristic for better understanding of miRNA nature and their potentials in clinic. In this review, we will briefly discuss the recent progress in miRNA research, particularly its impact on protein and its clinical application in cancer research in a view of miRNA family. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
Affiliation(s)
- Qicong Wang
- School of Information Science and Technology, Xiamen University, Xiamen, 361005 Fujian, PR China
| | | | | | | | | | | |
Collapse
|
25
|
Song X, Wang M, Chen YPP, Wang H, Han P, Sun H. Prediction of pre-miRNA with multiple stem-loops using pruning algorithm. Comput Biol Med 2013; 43:409-16. [DOI: 10.1016/j.compbiomed.2013.02.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 02/01/2013] [Accepted: 02/05/2013] [Indexed: 01/28/2023]
|
26
|
Kleftogiannis D, Korfiati A, Theofilatos K, Likothanassis S, Tsakalidis A, Mavroudi S. Where we stand, where we are moving: Surveying computational techniques for identifying miRNA genes and uncovering their regulatory role. J Biomed Inform 2013; 46:563-73. [PMID: 23501016 DOI: 10.1016/j.jbi.2013.02.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Revised: 01/08/2013] [Accepted: 02/12/2013] [Indexed: 12/19/2022]
Abstract
Traditional biology was forced to restate some of its principles when the microRNA (miRNA) genes and their regulatory role were firstly discovered. Typically, miRNAs are small non-coding RNA molecules which have the ability to bind to the 3'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression. Existing experimental techniques for their identification and the prediction of the target genes share some important limitations such as low coverage, time consuming experiments and high cost reagents. Hence, many computational methods have been proposed for these tasks to overcome these limitations. Recently, many researchers emphasized on the development of computational approaches to predict the participation of miRNA genes in regulatory networks and to analyze their transcription mechanisms. All these approaches have certain advantages and disadvantages which are going to be described in the present survey. Our work is differentiated from existing review papers by updating the methodologies list and emphasizing on the computational issues that arise from the miRNA data analysis. Furthermore, in the present survey, the various miRNA data analysis steps are treated as an integrated procedure whose aims and scope is to uncover the regulatory role and mechanisms of the miRNA genes. This integrated view of the miRNA data analysis steps may be extremely useful for all researchers even if they work on just a single step.
Collapse
Affiliation(s)
- Dimitrios Kleftogiannis
- King Abdullah University of Science and Technology (KAUST), Computer Science and Mathematical Sciences and Engineering Division, Thuwal, Saudi Arabia
| | | | | | | | | | | |
Collapse
|
27
|
Wang Z, He K, Wang Q, Yang Y, Pan Y. The prediction of the porcine pre-microRNAs in genome-wide based on support vector machine (SVM) and homology searching. BMC Genomics 2012; 13:729. [PMID: 23268561 PMCID: PMC3545972 DOI: 10.1186/1471-2164-13-729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 12/22/2012] [Indexed: 12/19/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a class of small non-coding RNAs that regulate gene expression by targeting mRNAs for translation repression or mRNA degradation. Although many miRNAs have been discovered and studied in human and mouse, few studies focused on porcine miRNAs, especially in genome wide. Results Here, we adopted computational approaches including support vector machine (SVM) and homology searching to make a global scanning on the pre-miRNAs of pigs. In our study, we built the SVM-based porcine pre-miRNAs classifier with a sensitivity of 100%, a specificity of 91.2% and a total prediction accuracy of 95.6%, respectively. Moreover, 2204 novel porcine pre-miRNA candidates were found by using SVM-based pre-miRNAs classifier. Besides, 116 porcine pre-miRNA candidates were detected by homology searching. Conclusions We identified the porcine pre-miRNA in genome-wide through computational approaches by utilizing the data sets of pigs and set up the porcine pre-miRNAs library which may provide us a global scanning on the pre-miRNAs of pigs in genome level and would benefit subsequent experimental research on porcine miRNA functional and expression analysis.
Collapse
Affiliation(s)
- Zhen Wang
- School of Agriculture and Biology, Department of Animal Science, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | | | | | | | | |
Collapse
|
28
|
Abstract
microRNAs (miRNAs) are small endogenous non-coding RNAs that function as the universal specificity factors in post-transcriptional gene silencing. Discovering miRNAs, identifying their targets and further inferring miRNA functions have been a critical strategy for understanding normal biological processes of miRNAs and their roles in the development of disease. In this review, we focus on computational methods of inferring miRNA functions, including miRNA functional annotation and inferring miRNA regulatory modules, by integrating heterogeneous data sources. We also briefly introduce the research in miRNA discovery and miRNA-target identification with an emphasis on the challenges to computational biology.
Collapse
Affiliation(s)
- Bing Liu
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, University Drive, Callaghan NSW 2308, Australia.
| | | | | |
Collapse
|
29
|
Shen W, Chen M, Wei G, Li Y. MicroRNA prediction using a fixed-order Markov model based on the secondary structure pattern. PLoS One 2012; 7:e48236. [PMID: 23118959 PMCID: PMC3484136 DOI: 10.1371/journal.pone.0048236] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 09/28/2012] [Indexed: 12/21/2022] Open
Abstract
Predicting miRNAs is an arduous task, due to the diversity of the precursors and complexity of enzyme processes. Although several prediction approaches have reached impressive performances, few of them could achieve a full-function recognition of mature miRNA directly from the candidate hairpins across species. Therefore, researchers continue to seek a more powerful model close to biological recognition to miRNA structure. In this report, we describe a novel miRNA prediction algorithm, known as FOMmiR, using a fixed-order Markov model based on the secondary structural pattern. For a training dataset containing 809 human pre-miRNAs and 6441 human pseudo-miRNA hairpins, the model's parameters were defined and evaluated. The results showed that FOMmiR reached 91% accuracy on the human dataset through 5-fold cross-validation. Moreover, for the independent test datasets, the FOMmiR presented an outstanding prediction in human and other species including vertebrates, Drosophila, worms and viruses, even plants, in contrast to the well-known algorithms and models. Especially, the FOMmiR was not only able to distinguish the miRNA precursors from the hairpins, but also locate the position and strand of the mature miRNA. Therefore, this study provides a new generation of miRNA prediction algorithm, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences. And it presents a new understanding of the biological recognition based on the strongest signal's location detected by FOMmiR, which might be closely associated with the enzyme cleavage mechanism during the miRNA maturation.
Collapse
Affiliation(s)
- Wei Shen
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, China
| | - Ming Chen
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, China
| | - Guo Wei
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, China
| | - Yan Li
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, China
- Bioinformatics Laboratory, Chongqing Key Laboratory for Disease Proteomics, Chongqing, China
- * E-mail:
| |
Collapse
|
30
|
Adenosine A2A receptor upregulation in human PMNs is controlled by miRNA-214, miRNA-15, and miRNA-16. Shock 2012; 37:156-63. [PMID: 22249219 DOI: 10.1097/shk.0b013e31823f16bc] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Abstract
Immunosuppressive signaling via the adenosine A2A receptor (A2AR) is an important pathway to control inflammation. In immune cells, expression levels of A2ARs influence responsiveness to inflammatory stimuli. However, mechanisms driving expressional changes of A2ARs are still largely elusive. In the current study, we have investigated the impact of microRNAs (miRNAs) on A2AR expression in human polymorphonuclear leukocytes (PMNs) and T cells. Bioinformatic analyses and reporter gene assays revealed that A2AR expression is controlled by miRNA-214, miRNA-15, and miRNA-16. We detected all three miRNAs in both human PMNs and T cells. However, in PMNs, up to 10-fold higher levels of miRNA-16 and miRNA-214 were detected as compared with T cells. Upon in vitro stimulation, no significant expressional changes occurred. Expression levels of all three miRNAs strongly differed between individuals. A2AR expression also exhibited significant differences between PMNs and T cells: In PMNs, more than a 60-fold increase was seen upon LPS stimulation, whereas in T cells only a 2-fold increase was observed upon anti-CD3/CD28 activation. The extent of A2AR upregulation in PMNs strongly differed between individuals (from less than 10-fold to more than 100-fold). In PMNs, the increase in A2AR mRNA expression upon stimulation was inversely correlated with the expression levels of miRNA-214, miRNA-15, and miRNA-16 (R = -0.87, P < 0.0001); no correlation was found in human T cells. These results indicate that individual miRNA profiles gain important influence on A2AR expression regulation in PMNs upon stimulation. Determination of miRNA expression levels may help to identify patients with an increased risk for severe inflammation.
Collapse
|
31
|
Liu X, He S, Skogerbø G, Gong F, Chen R. Integrated sequence-structure motifs suffice to identify microRNA precursors. PLoS One 2012; 7:e32797. [PMID: 22438883 PMCID: PMC3305290 DOI: 10.1371/journal.pone.0032797] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 01/31/2012] [Indexed: 11/18/2022] Open
Abstract
Background Upwards of 1200 miRNA loci have hitherto been annotated in the human genome. The specific features defining a miRNA precursor and deciding its recognition and subsequent processing are not yet exhaustively described and miRNA loci can thus not be computationally identified with sufficient confidence. Results We rendered pre-miRNA and non-pre-miRNA hairpins as strings of integrated sequence-structure information, and used the software Teiresias to identify sequence-structure motifs (ss-motifs) of variable length in these data sets. Using only ss-motifs as features in a Support Vector Machine (SVM) algorithm for pre-miRNA identification achieved 99.2% specificity and 97.6% sensitivity on a human test data set, which is comparable to previously published algorithms employing combinations of sequence-structure and additional features. Further analysis of the ss-motif information contents revealed strongly significant deviations from those of the respective training sets, revealing important potential clues as to how the sequence and structural information of RNA hairpins are utilized by the miRNA processing apparatus. Conclusion Integrated sequence-structure motifs of variable length apparently capture nearly all information required to distinguish miRNA precursors from other stem-loop structures.
Collapse
Affiliation(s)
- Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, PR China
- The Key Laboratory of Random Complex Structures and Data, Chinese Academy of Sciences, Beijing, PR China
- National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, PR China
| | - Shunmin He
- Institute of Zoology, Chinese Academy of Sciences, Beijing, PR China
| | - Geir Skogerbø
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, PR China
| | - Fuzhou Gong
- The Key Laboratory of Random Complex Structures and Data, Chinese Academy of Sciences, Beijing, PR China
- National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, PR China
- Institute of Applied Mathematics, Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing, PR China
- * E-mail: (FG); (RC)
| | - Runsheng Chen
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, PR China
- National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, PR China
- * E-mail: (FG); (RC)
| |
Collapse
|
32
|
Abstract
miRNAs are small non coding RNA structures which play important roles in biological processes. Finding miRNA precursors in genomes is therefore an important task, where computational methods are required. The goal of these methods is to select potential pre-miRNAs which could be validated by experimental methods. With the new generation of sequencing techniques, it is important to have fast algorithms that are able to treat whole genomes in acceptable times. We developed an algorithm based on an original method where an approximation of miRNA hairpins are first searched, before reconstituting the pre-miRNA structure. The approximation step allows a substantial decrease in the number of possibilities and thus the time required for searching. Our method was tested on different genomic sequences, and was compared with CID-miRNA, miRPara and VMir. It gives in almost all cases better sensitivity and selectivity. It is faster than CID-miRNA, miRPara and VMir: it takes ≈ 30 s to process a 1 MB sequence, when VMir takes 30 min, miRPara takes 20 h and CID-miRNA takes 55 h. We present here a fast ab-initio algorithm for searching for pre-miRNA precursors in genomes, called miRNAFold. miRNAFold is available at http://EvryRNA.ibisc.univ-evry.fr/.
Collapse
Affiliation(s)
- Sébastien Tempel
- Laboratoire IBISC, Université d'Evry-Val d'Essonne/Genopole, 23 Boulevard de France, 91034 Evry, France
| | | |
Collapse
|
33
|
Wu Y, Wei B, Liu H, Li T, Rayner S. MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences. BMC Bioinformatics 2011; 12:107. [PMID: 21504621 PMCID: PMC3110143 DOI: 10.1186/1471-2105-12-107] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Accepted: 04/19/2011] [Indexed: 02/07/2023] Open
Abstract
Background MicroRNAs are a family of ~22 nt small RNAs that can regulate gene expression at the post-transcriptional level. Identification of these molecules and their targets can aid understanding of regulatory processes. Recently, HTS has become a common identification method but there are two major limitations associated with the technique. Firstly, the method has low efficiency, with typically less than 1 in 10,000 sequences representing miRNA reads and secondly the method preferentially targets highly expressed miRNAs. If sequences are available, computational methods can provide a screening step to investigate the value of an HTS study and aid interpretation of results. However, current methods can only predict miRNAs for short fragments and have usually been trained against small datasets which don't always reflect the diversity of these molecules. Results We have developed a software tool, miRPara, that predicts most probable mature miRNA coding regions from genome scale sequences in a species specific manner. We classified sequences from miRBase into animal, plant and overall categories and used a support vector machine to train three models based on an initial set of 77 parameters related to the physical properties of the pre-miRNA and its miRNAs. By applying parameter filtering we found a subset of ~25 parameters produced higher prediction ability compared to the full set. Our software achieves an accuracy of up to 80% against experimentally verified mature miRNAs, making it one of the most accurate methods available. Conclusions miRPara is an effective tool for locating miRNAs coding regions in genome sequences and can be used as a screening step prior to HTS experiments. It is available at http://www.whiov.ac.cn/bioinformatics/mirpara
Collapse
Affiliation(s)
- Yonggan Wu
- Bioinformatics Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Science, PR of China
| | | | | | | | | |
Collapse
|
34
|
Zhang Y, Yang Y, Zhang H, Jiang X, Xu B, Xue Y, Cao Y, Zhai Q, Zhai Y, Xu M, Cooke HJ, Shi Q. Prediction of novel pre-microRNAs with high accuracy through boosting and SVM. ACTA ACUST UNITED AC 2011; 27:1436-7. [PMID: 21436129 DOI: 10.1093/bioinformatics/btr148] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
UNLABELLED High-throughput deep-sequencing technology has generated an unprecedented number of expressed short sequence reads, presenting not only an opportunity but also a challenge for prediction of novel microRNAs. To verify the existence of candidate microRNAs, we have to show that these short sequences can be processed from candidate pre-microRNAs. However, it is laborious and time consuming to verify these using existing experimental techniques. Therefore, here, we describe a new method, miRD, which is constructed using two feature selection strategies based on support vector machines (SVMs) and boosting method. It is a high-efficiency tool for novel pre-microRNA prediction with accuracy up to 94.0% among different species. AVAILABILITY miRD is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/rpg/mird/mird.php.
Collapse
Affiliation(s)
- Yuanwei Zhang
- Department of Life Science, Hefei National Laboratory for Physical Sciences, Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Abstract
Originally discovered in C. elegans, microRNAs (miRNAs) are small RNAs that regulate fundamental cellular processes in diverse organisms. MiRNAs are encoded within the genome and are initially transcribed as primary transcripts that can be several kilobases in length. Primary transcripts are successively cleaved by two RNase III enzymes, Drosha in the nucleus and Dicer in the cytoplasm, to produce ∼70 nucleotide (nt) long precursor miRNAs and 22 nt long mature miRNAs, respectively. Mature miRNAs regulate gene expression post-transcriptionally by imperfectly binding target mRNAs in association with the multiprotein RNA induced silencing complex (RISC). The conserved sequence, expression pattern, and function of some miRNAs across distinct species as well as the importance of specific miRNAs in many biological pathways have led to an explosion in the study of miRNA biogenesis, miRNA target identification, and miRNA target regulation. Many advances in our understanding of miRNA biology have come from studies in the powerful model organism C. elegans. This chapter reviews the current methods used in C. elegans to study miRNA biogenesis, small RNA populations, miRNA-protein complexes, and miRNA target regulation.
Collapse
Affiliation(s)
| | - Shih-Peng Chan
- Department of Molecular, Cellular and Developmental Biology,Yale University, New Haven, Connecticut, USA
| | - Frank J Slack
- Department of Molecular, Cellular and Developmental Biology,Yale University, New Haven, Connecticut, USA
| | - Amy E Pasquinelli
- Department of Biology, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
36
|
MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features. BMC Bioinformatics 2010; 11 Suppl 11:S11. [PMID: 21172046 PMCID: PMC3024864 DOI: 10.1186/1471-2105-11-s11-s11] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Background MicroRNAs (simply miRNAs) are derived from larger hairpin RNA precursors and play essential regular roles in both animals and plants. A number of computational methods for miRNA genes finding have been proposed in the past decade, yet the problem is far from being tackled, especially when considering the imbalance issue of known miRNAs and unidentified miRNAs, and the pre-miRNAs with multi-loops or higher minimum free energy (MFE). This paper presents a new computational approach, miRenSVM, for finding miRNA genes. Aiming at better prediction performance, an ensemble support vector machine (SVM) classifier is established to deal with the imbalance issue, and multi-loop features are included for identifying those pre-miRNAs with multi-loops. Results We collected a representative dataset, which contains 697 real miRNA precursors identified by experimental procedure and other computational methods, and 5428 pseudo ones from several datasets. Experiments showed that our miRenSVM achieved a 96.5% specificity and a 93.05% sensitivity on the dataset. Compared with the state-of-the-art approaches, miRenSVM obtained better prediction results. We also applied our method to predict 14 Homo sapiens pre-miRNAs and 13 Anopheles gambiae pre-miRNAs that first appeared in miRBase13.0, MiRenSVM got a 100% prediction rate. Furthermore, performance evaluation was conducted over 27 additional species in miRBase13.0, and 92.84% (4863/5238) animal pre-miRNAs were correctly identified by miRenSVM. Conclusion MiRenSVM is an ensemble support vector machine (SVM) classification system for better detecting miRNA genes, especially those with multi-loop secondary structure.
Collapse
|
37
|
Huang Y, Zou Q, Wang SP, Tang SM, Zhang GZ, Shen XJ. The discovery approaches and detection methods of microRNAs. Mol Biol Rep 2010; 38:4125-35. [PMID: 21107708 DOI: 10.1007/s11033-010-0532-1] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2010] [Accepted: 11/15/2010] [Indexed: 12/28/2022]
Abstract
MicroRNAs (miRNAs) are small, highly conserved, non-coding RNAs that regulate gene expression of target mRNAs through cleavage or translational inhibition. Computer-based approaches for miRNA gene identification are being considered as indispensable in miRNAs research. Similarly, experimental approaches for detection of miRNAs are crucial to the testing and validating of computational algorithms. The detection of miRNAs in tissues or cells can supply valuable information for investigating the biological function of these molecules. Selective and highly sensitive detection methods will pave the way for extended understanding of miRNA function within organisms. In this review, we summarize the various computational methods for identification of miRNAs as well as the methodologies that have been developed to detection miRNAs.
Collapse
Affiliation(s)
- Yong Huang
- Jiang Su University of Science and Technology, Zhenjiang, 212018, Jiangsu, People's Republic of China
| | | | | | | | | | | |
Collapse
|
38
|
Chellappan P, Xia J, Zhou X, Gao S, Zhang X, Coutino G, Vazquez F, Zhang W, Jin H. siRNAs from miRNA sites mediate DNA methylation of target genes. Nucleic Acids Res 2010; 38:6883-94. [PMID: 20621980 PMCID: PMC2978365 DOI: 10.1093/nar/gkq590] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2010] [Revised: 06/15/2010] [Accepted: 06/16/2010] [Indexed: 11/17/2022] Open
Abstract
Arabidopsis microRNA (miRNA) genes (MIR) give rise to 20- to 22-nt miRNAs that are generated predominantly by the type III endoribonuclease Dicer-like 1 (DCL1) but do not require any RNA-dependent RNA Polymerases (RDRs) or RNA Polymerase IV (Pol IV). Here, we identify a novel class of non-conserved MIR genes that give rise to two small RNA species, a 20- to 22-nt species and a 23- to 27-nt species, at the same site. Genetic analysis using small RNA pathway mutants reveals that the 20- to 22-nt small RNAs are typical miRNAs generated by DCL1 and are associated with Argonaute 1 (AGO1). In contrast, the accumulation of the 23- to 27-nt small RNAs from the miRNA-generating sites is dependent on DCL3, RDR2 and Pol IV, components of the typical heterochromatic small interfering RNA (hc-siRNA) pathway. We further demonstrate that these MIR-derived siRNAs associate with AGO4 and direct DNA methylation at some of their target loci in trans. In addition, we find that at the miRNA-generating sites, some conserved canonical MIR genes also produce siRNAs, which also induce DNA methylation at some of their target sites. Our systematic examination of published small RNA deep sequencing datasets of rice and moss suggests that this type of dual functional MIRs exist broadly in plants.
Collapse
Affiliation(s)
- Padmanabhan Chellappan
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Jing Xia
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Xuefeng Zhou
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Shang Gao
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Xiaoming Zhang
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Gabriela Coutino
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Franck Vazquez
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Weixiong Zhang
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| | - Hailing Jin
- Department of Plant Pathology and Microbiology, Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California, Riverside, California, CA 92521, Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA, Botanical Institute of Basel, Zurich-Basel Plant Science Center, University of Basel, Switzerland and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA
| |
Collapse
|
39
|
Mathelier A, Carbone A. MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data. ACTA ACUST UNITED AC 2010; 26:2226-34. [PMID: 20591903 DOI: 10.1093/bioinformatics/btq329] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
MOTIVATION MicroRNAs (miRNAs) are a class of endogenes derived from a precursor (pre-miRNA) and involved in post-transcriptional regulation. Experimental identification of novel miRNAs is difficult because they are often transcribed under specific conditions and cell types. Several computational methods were developed to detect new miRNAs starting from known ones or from deep sequencing data, and to validate their pre-miRNAs. RESULTS We present a genome-wide search algorithm, called MIReNA, that looks for miRNA sequences by exploring a multidimensional space defined by only five (physical and combinatorial) parameters characterizing acceptable pre-miRNAs. MIReNA validates pre-miRNAs with high sensitivity and specificity, and detects new miRNAs by homology from known miRNAs or from deep sequencing data. A performance comparison between MIReNA and four available predictive systems has been done. MIReNA approach is strikingly simple but it turns out to be powerful at least as much as more sophisticated algorithmic methods. MIReNA obtains better results than three known algorithms that validate pre-miRNAs. It demonstrates that machine-learning is not a necessary algorithmic approach for pre-miRNAs computational validation. In particular, machine learning algorithms can only confirm pre-miRNAs that look alike known ones, this being a limitation while exploring species with no known pre-miRNAs. The possibility to adapt the search to specific species, possibly characterized by specific properties of their miRNAs and pre-miRNAs, is a major feature of MIReNA. A parameter adjustment calibrates specificity and sensitivity in MIReNA, a key feature for predictive systems, which is not present in machine learning approaches. Comparison of MIReNA with miRDeep using deep sequencing data to predict miRNAs highlights a highly specific predictive power of MIReNA. AVAILABILITY At the address http://www.ihes.fr/carbone/data8/.
Collapse
Affiliation(s)
- Anthony Mathelier
- UPMC Université Paris 06, FRE3214, Génomique Analytique, Paris, France
| | | |
Collapse
|
40
|
Liu Q, Cui J, Yang Q, Xu Y. In-silico prediction of blood-secretory human proteins using a ranking algorithm. BMC Bioinformatics 2010; 11:250. [PMID: 20465853 PMCID: PMC2877692 DOI: 10.1186/1471-2105-11-250] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Accepted: 05/14/2010] [Indexed: 01/19/2023] Open
Abstract
Background Computational identification of blood-secretory proteins, especially proteins with differentially expressed genes in diseased tissues, can provide highly useful information in linking transcriptomic data to proteomic studies for targeted disease biomarker discovery in serum. Results A new algorithm for prediction of blood-secretory proteins is presented using an information-retrieval technique, called manifold ranking. On a dataset containing 305 known blood-secretory human proteins and a large number of other proteins that are either not blood-secretory or unknown, the new method performs better than the previous published method, measured in terms of the area under the recall-precision curve (AUC). A key advantage of the presented method is that it does not explicitly require a negative training set, which could often be noisy or difficult to derive for most biological problems, hence making our method more applicable than classification-based data mining methods in general biological studies. Conclusion We believe that our program will prove to be very useful to biomedical researchers who are interested in finding serum markers, especially when they have candidate proteins derived through transcriptomic or proteomic analyses of diseased tissues. A computer program is developed for prediction of blood-secretory proteins based on manifold ranking, which is accessible at our website http://csbl.bmb.uga.edu/publications/materials/qiliu/blood_secretory_protein.html.
Collapse
Affiliation(s)
- Qi Liu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | | | | | | |
Collapse
|
41
|
New syntax to describe local continuous structure-sequence information for recognizing new pre-miRNAs. J Theor Biol 2010; 264:578-84. [PMID: 20202471 DOI: 10.1016/j.jtbi.2010.02.037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Revised: 02/21/2010] [Accepted: 02/22/2010] [Indexed: 12/16/2022]
Abstract
As an important complement to experimental identification of pre-miRNA, computational prediction methods are attracting more and more attention. Features extracted from pre-miRNA are the key to computational prediction. Among the features, local continuous structure-sequence information is usually employed by existing computational methods. As more and more species-specific miRNAs have been identified, a new syntax is required to describe pre-miRNA local continuous structure-sequence features. Therefore, we proposed here the use of couplet syntax to describe pre-miRNA intrinsic features. When tested on a dataset from miRBase12.0 with the use of features extracted by couplet syntax, the SVM classifier achieves a sensitivity of 81.98% and specificity of 87.16% on a human test set and a sensitivity of 86.71% on all other species. The obtained results indicate that the proposed couplet syntax can describe the intrinsic features of pre-miRNA better than traditional methods. By means of describing pre-miRNA secondary structure more precisely and masking frequently mutated nucleotides, couplet syntax provides a powerful feature-describing method that can be applied to many computational prediction methods.
Collapse
|
42
|
Li S, Mead EA, Liang S, Tu Z. Direct sequencing and expression analysis of a large number of miRNAs in Aedes aegypti and a multi-species survey of novel mosquito miRNAs. BMC Genomics 2009; 10:581. [PMID: 19961592 PMCID: PMC2797818 DOI: 10.1186/1471-2164-10-581] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/04/2009] [Indexed: 11/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a novel class of gene regulators whose biogenesis involves hairpin structures called precursor miRNAs, or pre-miRNAs. A pre-miRNA is processed to make a miRNA:miRNA* duplex, which is then separated to generate a mature miRNA and a miRNA*. The mature miRNAs play key regulatory roles during embryonic development as well as other cellular processes. They are also implicated in control of viral infection as well as innate immunity. Direct experimental evidence for mosquito miRNAs has been recently reported in anopheline mosquitoes based on small-scale cloning efforts. Results We obtained approximately 130, 000 small RNA sequences from the yellow fever mosquito, Aedes aegypti, by 454 sequencing of samples that were isolated from mixed-age embryos and midguts from sugar-fed and blood-fed females, respectively. We also performed bioinformatics analysis on the Ae. aegypti genome assembly to identify evidence for additional miRNAs. The combination of these approaches uncovered 98 different pre-miRNAs in Ae. aegypti which could produce 86 distinct miRNAs. Thirteen miRNAs, including eight novel miRNAs identified in this study, are currently only found in mosquitoes. We also identified five potential revisions to previously annotated miRNAs at the miRNA termini, two cases of highly abundant miRNA* sequences, 14 miRNA clusters, and 17 cases where more than one pre-miRNA hairpin produces the same or highly similar mature miRNAs. A number of miRNAs showed higher levels in midgut from blood-fed female than that from sugar-fed female, which was confirmed by northern blots on two of these miRNAs. Northern blots also revealed several miRNAs that showed stage-specific expression. Detailed expression analysis of eight of the 13 mosquito-specific miRNAs in four divergent mosquito genera identified cases of clearly conserved expression patterns and obvious differences. Four of the 13 miRNAs are specific to certain lineage(s) within mosquitoes. Conclusion This study provides the first systematic analysis of miRNAs in Ae. aegypti and offers a substantially expanded list of miRNAs for all mosquitoes. New insights were gained on the evolution of conserved and lineage-specific miRNAs in mosquitoes. The expression profiles of a few miRNAs suggest stage-specific functions and functions related to embryonic development or blood feeding. A better understanding of the functions of these miRNAs will offer new insights in mosquito biology and may lead to novel approaches to combat mosquito-borne infectious diseases.
Collapse
Affiliation(s)
- Song Li
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | | | | | | |
Collapse
|
43
|
Morozova O, Hirst M, Marra MA. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 2009; 10:135-51. [PMID: 19715439 DOI: 10.1146/annurev-genom-082908-145957] [Citation(s) in RCA: 348] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution.
Collapse
Affiliation(s)
- Olena Morozova
- BC Cancer Agency, Genome Sciences Center, Vancouver, BC V5Z 4S6, Canada.
| | | | | |
Collapse
|
44
|
Creighton CJ, Reid JG, Gunaratne PH. Expression profiling of microRNAs by deep sequencing. Brief Bioinform 2009; 10:490-7. [PMID: 19332473 DOI: 10.1093/bib/bbp019] [Citation(s) in RCA: 221] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MicroRNAs are short non-coding RNAs that regulate the stability and translation of mRNAs. Profiling experiments, using microarray or deep sequencing technology, have identified microRNAs that are preferentially expressed in certain tissues, specific stages of development, or disease states such as cancer. Deep sequencing utilizes massively parallel sequencing, generating millions of small RNA sequence reads from a given sample. Profiling of microRNAs by deep sequencing measures absolute abundance and allows for the discovery of novel microRNAs that have eluded previous cloning and standard sequencing efforts. Public databases provide in silico predictions of microRNA gene targets by various algorithms. To better determine which of these predictions represent true positives, microRNA expression data can be integrated with gene expression data to identify putative microRNA:mRNA functional pairs. Here we discuss tools and methodologies for the analysis of microRNA expression data from deep sequencing.
Collapse
Affiliation(s)
- Chad J Creighton
- Dan L. Duncan Cancer Center Division of Biostatistics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | |
Collapse
|
45
|
Jiang Y, Cukic B, Adjeroh DA, Skinner HD, Lin J, Shen QJ, Jiang BH. An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets. Cancer Inform 2009; 7:75-89. [PMID: 19352460 PMCID: PMC2664698 DOI: 10.4137/cin.s1054] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.
Collapse
Affiliation(s)
- Yue Jiang
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA.
| | | | | | | | | | | | | |
Collapse
|