1
|
Jaiswal S, Iquebal MA, Arora V, Sheoran S, Sharma P, Angadi UB, Dahiya V, Singh R, Tiwari R, Singh GP, Rai A, Kumar D. Development of species specific putative miRNA and its target prediction tool in wheat (Triticum aestivum L.). Sci Rep 2019; 9:3790. [PMID: 30846812 PMCID: PMC6405928 DOI: 10.1038/s41598-019-40333-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 02/06/2019] [Indexed: 01/22/2023] Open
Abstract
MicroRNA are 20-24 nt, non-coding, single stranded molecule regulating traits and stress response. Tissue and time specific expression limits its detection, thus is major challenge in their discovery. Wheat has limited 119 miRNAs in MiRBase due to limitation of conservation based methodology where old and new miRNA genes gets excluded. This is due to origin of hexaploid wheat by three successive hybridization, older AA, BB and younger DD subgenome. Species specific miRNA prediction (SMIRP concept) based on 152 thermodynamic features of training dataset using support vector machine learning approach has improved prediction accuracy to 97.7%. This has been implemented in TamiRPred ( http://webtom.cabgrid.res.in/tamirpred ). We also report highest number of putative miRNA genes (4464) of wheat from whole genome sequence populated in database developed in PHP and MySQL. TamiRPred has predicted 2092 (>45.10%) additional miRNA which was not predicted by miRLocator. Predicted miRNAs have been validated by miRBase, small RNA libraries, secondary structure, degradome dataset, star miRNA and binding sites in wheat coding region. This tool can accelerate miRNA polymorphism discovery to be used in wheat trait improvement. Since it predicts chromosome-wise miRNA genes with their respective physical location thus can be transferred using linked SSR markers. This prediction approach can be used as model even in other polyploid crops.
Collapse
Affiliation(s)
- Sarika Jaiswal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - M A Iquebal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - Vasu Arora
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - Sonia Sheoran
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, 132001, India
| | - Pradeep Sharma
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, 132001, India
| | - U B Angadi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - Vikas Dahiya
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - Rajender Singh
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, 132001, India
| | - Ratan Tiwari
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, 132001, India
| | - G P Singh
- ICAR-Indian Institute of Wheat and Barley Research, Karnal, Haryana, 132001, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India
| | - Dinesh Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, PUSA, New Delhi, 110012, India.
| |
Collapse
|
2
|
Jebessa E, Ouyang H, Abdalla BA, Li Z, Abdullahi AY, Liu Q, Nie Q, Zhang X. Characterization of miRNA and their target gene during chicken embryo skeletal muscle development. Oncotarget 2017; 9:17309-17324. [PMID: 29707110 PMCID: PMC5915118 DOI: 10.18632/oncotarget.22457] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 10/11/2017] [Indexed: 11/30/2022] Open
Abstract
MicroRNAs (miRNAs) are non-coding RNAs that regulate mRNA expression by degradation or translational inhibition. We investigated the underlying molecular mechanisms of skeletal muscle development based on differentially expressed genes and miRNAs. We compared mRNA and miRNA from chicken skeletal muscle at embryonic day E11, E16 and one day post-hatch (P1). The interaction networks were constructed, according to target prediction results and integration analysis of up-regulated genes with down regulated miRNAs or down-regulated genes with up-regulated miRNAs with |log2fold change| ≥ 1.75, P < 0.005. The miRNA-mRNA integration analysis showed high number of mRNAs regulated by a few number of miRNAs. In the E11_VS_E16, comparison group we identified biological processes including muscle maintenance, myoblast proliferation and muscle thin filament formation. The E11_VS_P1 group comparison included negative regulation of axon extension, sarcomere organization, and cell redox homeostasis and kinase inhibitor activity. The E16_VS_P1 comparison group contained genes for the negative regulation of anti-apoptosis and axon extension as well as glomerular basement membrane development. Functional in vitro assays indicated that over expression of miR-222a and miR-126–5p in DF-1 cells significantly reduced the mRNA levels of the target genes CPEB3 and FGFR3, respectively. These integrated analyses provide several candidates for future studies concerning miRNAs-target function on regulation of embryonic muscle development and growth.
Collapse
Affiliation(s)
- Endashaw Jebessa
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| | - Hongjia Ouyang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| | - Bahareldin Ali Abdalla
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| | - Zhenhui Li
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| | - Auwalu Yusuf Abdullahi
- Department of Animal Nutrition and Feed Science, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China
| | - Qingshen Liu
- Department of Animal Production and Management, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China
| | - Qinghua Nie
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| | - Xiquan Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China.,Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, China
| |
Collapse
|
3
|
Saçar Demirci MD, Baumbach J, Allmer J. On the performance of pre-microRNA detection algorithms. Nat Commun 2017; 8:330. [PMID: 28839141 PMCID: PMC5571158 DOI: 10.1038/s41467-017-00403-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 06/23/2017] [Indexed: 01/31/2023] Open
Abstract
MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches using all relevant, published, and novel data sets while judging algorithm performance based on ten intrinsic performance measures. We present an extensible framework, izMiR, which allows for the unbiased comparison of existing algorithms, adding new ones, and combining multiple approaches into ensemble methods. In an exhaustive attempt, we condense the results of millions of computations and show that no method is clearly superior; however, we provide a guideline for biomedical researchers to select a tool. Finally, we demonstrate that combining all of the methods into one ensemble approach, for the first time, allows reliable purely computational pre-miRNA detection in large eukaryotic genomes.As the experimental discovery of microRNAs (miRNAs) is cumbersome, computational tools have been developed for the prediction of pre-miRNAs. Here the authors develop a framework to assess the performance of existing and novel pre-miRNA prediction tools and provide guidelines for selecting an appropriate approach for a given data set.
Collapse
Affiliation(s)
| | - Jan Baumbach
- Computational Systems Biology, Max Planck Institute for Informatics, 66123, Saarbrücken, Germany.
- Computational Biology, University of Southern Denmark, DK-5230, Odense M, Denmark.
| | - Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, 35430, Turkey
- Bionia Incorporated, IZTEKGEB A8, Urla, Izmir, 35430, Turkey
| |
Collapse
|
4
|
Amirkhah R, Meshkin HN, Farazmand A, Rasko JEJ, Schmitz U. Computational and Experimental Identification of Tissue-Specific MicroRNA Targets. Methods Mol Biol 2017; 1580:127-147. [PMID: 28439832 DOI: 10.1007/978-1-4939-6866-4_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this chapter we discuss computational methods for the prediction of microRNA (miRNA) targets. More specifically, we consider machine learning-based approaches and explain why these methods have been relatively unsuccessful in reducing the number of false positive predictions. Further we suggest approaches designed to improve their performance by considering tissue-specific target regulation. We argue that the miRNA targetome differs depending on the tissue type and introduce a novel algorithm that predicts miRNA targets specifically for colorectal cancer. We discuss features of miRNAs and target sites that affect target recognition, and how next-generation sequencing data can support the identification of novel miRNAs, differentially expressed miRNAs and their tissue-specific mRNA targets. In addition, we introduce some experimental approaches for the validation of miRNA targets as well as web-based resources sharing predicted and validated miRNA target interactions.
Collapse
Affiliation(s)
- Raheleh Amirkhah
- Reza Institute of Cancer Bioinformatics and Personalized Medicine, Mashhad, Iran
| | - Hojjat Naderi Meshkin
- Stem Cells and Regenerative Medicine Research Group, Academic Center for Education, Culture Research (ACECR), Khorasan Razavi Branch, Mashhad, Iran
| | - Ali Farazmand
- Department of Cell and Molecular Biology, School of Biology, College of Science, University of Tehran, Tehran, Iran
| | - John E J Rasko
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown; Sydney Medical School, University of Sydney, Camperdown, NSW, 2050, Australia
| | - Ulf Schmitz
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown; Sydney Medical School, University of Sydney, Camperdown, NSW, 2050, Australia.
| |
Collapse
|
5
|
Khalifa W, Yousef M, Saçar Demirci MD, Allmer J. The impact of feature selection on one and two-class classification performance for plant microRNAs. PeerJ 2016; 4:e2135. [PMID: 27366641 PMCID: PMC4924126 DOI: 10.7717/peerj.2135] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 05/25/2016] [Indexed: 11/23/2022] Open
Abstract
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
Collapse
Affiliation(s)
- Waleed Khalifa
- Computer Science, The College of Sakhnin, Sakhnin, Israel.,The Institute of Applied Research- The Galilee Society, Shefa Amr, Israel
| | - Malik Yousef
- Computer Science, The College of Sakhnin, Sakhnin, Israel.,The Institute of Applied Research- The Galilee Society, Shefa Amr, Israel
| | | | - Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, Turkey.,IZTEKGEB, Bionia Incorporated, Urla, Izmir, Turkey
| |
Collapse
|
6
|
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants. Adv Bioinformatics 2016; 2016:5670851. [PMID: 27190509 PMCID: PMC4844869 DOI: 10.1155/2016/5670851] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 03/16/2016] [Indexed: 11/17/2022] Open
Abstract
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Collapse
|
7
|
Wong JJL, Au AYM, Gao D, Pinello N, Kwok CT, Thoeng A, Lau KA, Gordon JEA, Schmitz U, Feng Y, Nguyen TV, Middleton R, Bailey CG, Holst J, Rasko JEJ, Ritchie W. RBM3 regulates temperature sensitive miR-142-5p and miR-143 (thermomiRs), which target immune genes and control fever. Nucleic Acids Res 2016; 44:2888-97. [PMID: 26825461 PMCID: PMC4824108 DOI: 10.1093/nar/gkw041] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 01/13/2016] [Indexed: 12/27/2022] Open
Abstract
Fever is commonly used to diagnose disease and is consistently associated with increased mortality in critically ill patients. However, the molecular controls of elevated body temperature are poorly understood. We discovered that the expression of RNA-binding motif protein 3 (RBM3), known to respond to cold stress and to modulate microRNA (miRNA) expression, was reduced in 30 patients with fever, and in THP-1-derived macrophages maintained at a fever-like temperature (40°C). Notably, RBM3 expression is reduced during fever whether or not infection is demonstrable. Reduced RBM3 expression resulted in increased expression of RBM3-targeted temperature-sensitive miRNAs, we termed thermomiRs. ThermomiRs such as miR-142–5p and miR-143 in turn target endogenous pyrogens including IL-6, IL6ST, TLR2, PGE2 and TNF to complete a negative feedback mechanism, which may be crucial to prevent pathological hyperthermia. Using normal PBMCs that were exogenously exposed to fever-like temperature (40°C), we further demonstrate the trend by which decreased levels of RBM3 were associated with increased levels of miR-142–5p and miR-143 and vice versa over a 24 h time course. Collectively, our results indicate the existence of a negative feedback loop that regulates fever via reduced RBM3 levels and increased expression of miR-142–5p and miR-143.
Collapse
Affiliation(s)
- Justin J-L Wong
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Amy Y M Au
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Dadi Gao
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia Bioinformatics Laboratory, Centenary Institute, Camperdown 2050, Australia
| | - Natalia Pinello
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Chau-To Kwok
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Annora Thoeng
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Katherine A Lau
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Jane E A Gordon
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Ulf Schmitz
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Yue Feng
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Trung V Nguyen
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Robert Middleton
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia Bioinformatics Laboratory, Centenary Institute, Camperdown 2050, Australia
| | - Charles G Bailey
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia
| | - Jeff Holst
- Sydney Medical School, University of Sydney, NSW 2006, Australia Origins of Cancer Program, Centenary Institute, Camperdown 2050, Australia
| | - John E J Rasko
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown 2050, Australia
| | - William Ritchie
- Gene & Stem Cell Therapy Program, Centenary Institute, Camperdown 2050, Australia Sydney Medical School, University of Sydney, NSW 2006, Australia Bioinformatics Laboratory, Centenary Institute, Camperdown 2050, Australia CNRS, UMR 5203, Montpellier 34094, France
| |
Collapse
|
8
|
Menor M, Ching T, Zhu X, Garmire D, Garmire LX. mirMark: a site-level and UTR-level classifier for miRNA target prediction. Genome Biol 2015; 15:500. [PMID: 25344330 PMCID: PMC4243195 DOI: 10.1186/s13059-014-0500-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Indexed: 02/07/2023] Open
Abstract
MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or machine learning methods for the site- and UTR-level classifiers, mirMark significantly improves the overall predictive performance compared to existing publicly available methods. MirMark is available from https://github.com/lanagarmire/MirMark.
Collapse
Affiliation(s)
- Mark Menor
- Department of Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI 96822, USA
| | | | | | | | | |
Collapse
|
9
|
Yousef M, Allmer J, Khalifa W. Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant microRNA Detection. ACTA ACUST UNITED AC 2015. [DOI: 10.4236/jbise.2015.810065] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
Rawlings-Goss RA, Campbell MC, Tishkoff SA. Global population-specific variation in miRNA associated with cancer risk and clinical biomarkers. BMC Med Genomics 2014; 7:53. [PMID: 25169894 PMCID: PMC4159108 DOI: 10.1186/1755-8794-7-53] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 08/12/2014] [Indexed: 12/30/2022] Open
Abstract
Background MiRNA expression profiling is being actively investigated as a clinical biomarker and diagnostic tool to detect multiple cancer types and stages as well as other complex diseases. Initial investigations, however, have not comprehensively taken into account genetic variability affecting miRNA expression and/or function in populations of different ethnic backgrounds. Therefore, more complete surveys of miRNA genetic variability are needed to assess global patterns of miRNA variation within and between diverse human populations and their effect on clinically relevant miRNA genes. Methods Genetic variation in 1524 miRNA genes was examined using whole genome sequencing (60x coverage) in a panel of 69 unrelated individuals from 14 global populations, including European, Asian and African populations. Results We identified 33 previously undescribed miRNA variants, and 31 miRNA containing variants that are globally population-differentiated in frequency between African and non-African populations (PD-miRNA). The top 1% of PD-miRNA were significantly enriched for regulation of genes involved in glucose/insulin metabolism and cell division (p < 10−7), most significantly the mitosis pathway, which is strongly linked to cancer onset. Overall, we identify 7 PD-miRNAs that are currently implicated as cancer biomarkers or diagnostics: hsa-mir-202, hsa-mir-423, hsa-mir-196a-2, hsa-mir-520h, hsa-mir-647, hsa-mir-943, and hsa-mir-1908. Notably, hsa-mir-202, a potential breast cancer biomarker, was found to show significantly high allele frequency differentiation at SNP rs12355840, which is known to affect miRNA expression levels in vivo and subsequently breast cancer mortality. Conclusion MiRNA expression profiles represent a promising new category of disease biomarkers. However, population specific genetic variation can affect the prevalence and baseline expression of these miRNAs in diverse populations. Consequently, miRNA genetic and expression level variation among ethnic groups may be contributing in part to health disparities observed in multiple forms of cancer, specifically breast cancer, and will be an essential consideration when assessing the utility of miRNA biomarkers for the clinic.
Collapse
|
11
|
Abstract
MicroRNAs (miRNAs) are single-stranded, small, noncoding RNAs of about 22 nucleotides in length, which control gene expression at the posttranscriptional level through translational inhibition, degradation, adenylation, or destabilization of their target mRNAs. Although hundreds of miRNAs have been identified in various species, many more may still remain unknown. Therefore, discovery of new miRNA genes is an important step for understanding miRNA-mediated posttranscriptional regulation mechanisms. It seems that biological approaches to identify miRNA genes might be limited in their ability to detect rare miRNAs and are further limited to the tissues examined and the developmental stage of the organism under examination. These limitations have led to the development of sophisticated computational approaches attempting to identify possible miRNAs in silico. In this chapter, we discuss computational problems in miRNA prediction studies and review some of the many machine learning methods that have been tried to address the issues.
Collapse
|
12
|
Lopes IDON, Schliep A, de Carvalho ACPDLF. The discriminant power of RNA features for pre-miRNA recognition. BMC Bioinformatics 2014; 15:124. [PMID: 24884650 PMCID: PMC4046174 DOI: 10.1186/1471-2105-15-124] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 04/08/2014] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. RESULTS Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section "Results and discussion") while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section "Results and discussion"), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. CONCLUSION In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools, which may lead to the development of efficient ab-initio pre-miRNA discovery tools.The material to reproduce the main results from this paper can be downloaded from http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.gz.
Collapse
Affiliation(s)
- Ivani de O N Lopes
- Empresa Brasileira de Pesquisa Agropecuária, Embrapa Soja, Caixa Postal 231, Londrina-PR, CEP 86001-970, Brasil.
| | | | | |
Collapse
|
13
|
Friedländer MR, Lizano E, Houben AJS, Bezdan D, Báñez-Coronel M, Kudla G, Mateu-Huertas E, Kagerbauer B, González J, Chen KC, LeProust EM, Martí E, Estivill X. Evidence for the biogenesis of more than 1,000 novel human microRNAs. Genome Biol 2014; 15:R57. [PMID: 24708865 PMCID: PMC4054668 DOI: 10.1186/gb-2014-15-4-r57] [Citation(s) in RCA: 191] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 04/07/2014] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are established regulators of development, cell identity and disease. Although nearly two thousand human miRNA genes are known and new ones are continuously discovered, no attempt has been made to gauge the total miRNA content of the human genome. RESULTS Employing an innovative computational method on massively pooled small RNA sequencing data, we report 2,469 novel human miRNA candidates of which 1,098 are validated by in-house and published experiments. Almost 300 candidates are robustly expressed in a neuronal cell system and are regulated during differentiation or when biogenesis factors Dicer, Drosha, DGCR8 or Ago2 are silenced. To improve expression profiling, we devised a quantitative miRNA capture system. In a kidney cell system, 400 candidates interact with DGCR8 at transcript positions that suggest miRNA hairpin recognition, and 1,000 of the new miRNA candidates interact with Ago1 or Ago2, indicating that they are directly bound by miRNA effector proteins. From kidney cell CLASH experiments, in which miRNA-target pairs are ligated and sequenced, we observe hundreds of interactions between novel miRNAs and mRNA targets. The novel miRNA candidates are specifically but lowly expressed, raising the possibility that not all may be functional. Interestingly, the majority are evolutionarily young and overrepresented in the human brain. CONCLUSIONS In summary, we present evidence that the complement of human miRNA genes is substantially larger than anticipated, and that more are likely to be discovered in the future as more tissues and experimental conditions are sequenced to greater depth.
Collapse
Affiliation(s)
- Marc R Friedländer
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Esther Lizano
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Anna JS Houben
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniela Bezdan
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Genomic and Epigenomic Variation in Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mónica Báñez-Coronel
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, Scotland
| | - Elisabet Mateu-Huertas
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Birgit Kagerbauer
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Justo González
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Kevin C Chen
- Department of Genetics, Rutgers, State University of New Jersey, Frelinghuysen Road 174, Piscataway, NJ 08854, USA
- BioMaPS Institute for Quantitative Biology, Rutgers, State University of New Jersey, Frelinghuysen Road 174, Piscataway, NJ 08854, USA
| | - Emily M LeProust
- Genomics Solution Unit, Agilent Technologies Inc., Santa Clara, CA 95051, USA
| | - Eulàlia Martí
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Xavier Estivill
- Genomics and Disease Group, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Centro de Investigación Biomédica en Red Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Catalonia, Spain
- Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| |
Collapse
|
14
|
Warris S, Boymans S, Muiser I, Noback M, Krijnen W, Nap JP. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences. BMC Res Notes 2014; 7:34. [PMID: 24418292 PMCID: PMC3895842 DOI: 10.1186/1756-0500-7-34] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 01/07/2014] [Indexed: 11/29/2022] Open
Abstract
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Collapse
Affiliation(s)
| | | | | | | | | | - Jan-Peter Nap
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences, Groningen, The Netherlands.
| |
Collapse
|
15
|
Beamer LC, Linder L, Wu B, Eggert J. The Impact of Genomics on Oncology Nursing. Nurs Clin North Am 2013; 48:585-626. [DOI: 10.1016/j.cnur.2013.09.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
16
|
Gao D, Middleton R, Rasko JEJ, Ritchie W. miREval 2.0: a web tool for simple microRNA prediction in genome sequences. ACTA ACUST UNITED AC 2013; 29:3225-6. [PMID: 24048357 DOI: 10.1093/bioinformatics/btt545] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
RESULT We have developed miREval 2.0, an online tool that can simultaneously search up to 100 sequences for novel microRNAs (miRNAs) in multiple organisms. miREval 2.0 uses multiple published in silico approaches to detect miRNAs in sequences of interest. This tool can be used to discover miRNAs from DNA sequences or to validate candidates from sequencing data. AVAILABILITY http://mimirna.centenary.org.au/mireval/.
Collapse
Affiliation(s)
- Dadi Gao
- Bioinformatics Laboratory, Centenary Institute, Gene and Stem Cell Therapy Program, Centenary Institute, University of Sydney, Sydney, New South Wales, Australia and Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown, New South Wales 2050, Australia
| | | | | | | |
Collapse
|
17
|
Allmer J, Yousef M. Computational methods for ab initio detection of microRNAs. Front Genet 2012; 3:209. [PMID: 23087705 PMCID: PMC3467617 DOI: 10.3389/fgene.2012.00209] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs are small RNA sequences of 18–24 nucleotides in length, which serve as templates to drive post-transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing via the microprocessor complex, yielding a hairpin structure. Which is then exported into the cytosol where it is processed by Dicer and then incorporated into the RNA-induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, their modes of action are just beginning to be elucidated and therefore computational prediction algorithms cannot model the process but are usually forced to employ machine learning approaches. This work focuses on ab initio prediction methods throughout; and therefore homology-based miRNA detection methods are not discussed. Current ab initio prediction algorithms, their ties to data mining, and their prediction accuracy are detailed.
Collapse
Affiliation(s)
- Jens Allmer
- Department of Molecular Biology and Genetics, Izmir Institute of Technology Urla, Turkey
| | | |
Collapse
|
18
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. Nucleic Acids Res 2012; 41:e21. [PMID: 23012261 PMCID: PMC3592496 DOI: 10.1093/nar/gks878] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods--both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method--improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%-this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, King Mongkut's University of Technology Thonburi, Bang Mod, Thung Khru, Bangkok 10140, Thailand
| | | | | | | | | |
Collapse
|