1
|
Villalba-Bermell P, Marquez-Molins J, Gomez G. A multispecies study reveals the diversity and potential regulatory role of long noncoding RNAs in cucurbits. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:799-817. [PMID: 39254680 DOI: 10.1111/tpj.17013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 07/31/2024] [Accepted: 08/23/2024] [Indexed: 09/11/2024]
Abstract
Plant long noncoding RNAs (lncRNAs) exhibit features such as tissue-specific expression, spatiotemporal regulation, and stress responsiveness. Although diverse studies support the regulatory role of lncRNAs in model plants, our knowledge about lncRNAs in crops is limited. We employ a custom pipeline on a dataset of over 1000 RNA-seq samples across nine representative species of the family Cucurbitaceae to predict 91 209 nonredundant lncRNAs. The lncRNAs were characterized according to three confidence levels and classified by their genomic context into intergenic, natural antisense, intronic, and sense-overlapping. Compared with protein-coding genes, lncRNAs were, on average, expressed at low levels and displayed significantly higher specificity when considering tissue, developmental stages, and stress responsiveness. The evolutionary analysis indicates higher positional conservation than sequence conservation, probably linked to the conserved modular motifs within syntenic lncRNAs. Moreover, a positive correlation between the expression of intergenic/natural antisense lncRNAs and their closest/parental gene was observed. For those intergenic, the correlation decreases with the distance to the neighboring gene, supporting that their potential cis-regulatory effect is within a short-range. Furthermore, the analysis of developmental studies showed that a conserved NAT-lncRNA family is differentially expressed in a coordinated way with their cognate sense protein-coding genes. These genes code for proteins associated with phloem development, thus providing insights about the potential involvement of some of the identified lncRNAs in a developmental process. We expect that this extensive inventory will constitute a valuable resource for further research lines focused on elucidating the regulatory mechanisms mediated by lncRNAs in cucurbits.
Collapse
Affiliation(s)
- Pascual Villalba-Bermell
- Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas (CSIC) - Universitat de València (UV), Parc Científic, Cat. Agustín Escardino 9, 46980, Paterna, Spain
| | - Joan Marquez-Molins
- Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas (CSIC) - Universitat de València (UV), Parc Científic, Cat. Agustín Escardino 9, 46980, Paterna, Spain
| | - Gustavo Gomez
- Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Científicas (CSIC) - Universitat de València (UV), Parc Científic, Cat. Agustín Escardino 9, 46980, Paterna, Spain
| |
Collapse
|
2
|
Zhang Y, Huang J, Xie F, Huang Q, Jiao H, Cheng W. Identification of plant microRNAs using convolutional neural network. FRONTIERS IN PLANT SCIENCE 2024; 15:1330854. [PMID: 38567128 PMCID: PMC10985208 DOI: 10.3389/fpls.2024.1330854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/22/2024] [Indexed: 04/04/2024]
Abstract
MicroRNAs (miRNAs) are of significance in tuning and buffering gene expression. Despite abundant analysis tools that have been developed in the last two decades, plant miRNA identification from next-generation sequencing (NGS) data remains challenging. Here, we show that we can train a convolutional neural network to accurately identify plant miRNAs from NGS data. Based on our methods, we also present a user-friendly pure Java-based software package called Small RNA-related Intelligent and Convenient Analysis Tools (SRICATs). SRICATs encompasses all the necessary steps for plant miRNA analysis. Our results indicate that SRICATs outperforms currently popular software tools on the test data from five plant species. For non-commercial users, SRICATs is freely available at https://sourceforge.net/projects/sricats.
Collapse
|
3
|
Do VQ, Hoang-Thi C, Pham TT, Bui NL, Kim DT, Chu DT. Computational tools supporting known miRNA identification. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 203:225-242. [PMID: 38360000 DOI: 10.1016/bs.pmbts.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
The study of small RNAs is a field that is expanding quickly. Other functional short RNA molecules other than microRNAs, and gene expression regulators, have been found in animals and plants. MicroRNAs play a significant role in host-microbe interactions, and parasite microRNAs may affect the host's innate immunity. Furthermore, short RNAs are intriguing non-invasive biomarker possibilities because they can be found in physiological fluids. These trends suggest that for many researchers, quick and simple techniques for expression profiling and subsequent downstream analysis of miRNA-seq data are crucial. We selected sRNAtoolbox to make integrated sRNA research easier. Each tool can be used separately or to explore and analyze sRNAbench results in further depth. A special focus was placed on the tools' usability. We review available miRNA research tools to have an overview of the evaluation of the tools. Mainly we evaluate the tool sRNAtoolbox.
Collapse
Affiliation(s)
- Van-Quy Do
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam; Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam
| | - Chuc Hoang-Thi
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam; Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam
| | - Thanh-Truong Pham
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam; Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam
| | - Nhat-Le Bui
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam; Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam
| | - Dinh-Thai Kim
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam.
| | - Dinh-Toi Chu
- Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam; Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam.
| |
Collapse
|
4
|
Wajnberg G, Allain EP, Roy JW, Srivastava S, Saucier D, Morin P, Marrero A, O’Connell C, Ghosh A, Lewis SM, Ouellette RJ, Crapoulet N. Application of annotation-agnostic RNA sequencing data analysis tools for biomarker discovery in liquid biopsy. FRONTIERS IN BIOINFORMATICS 2023; 3:1127661. [PMID: 37252342 PMCID: PMC10213969 DOI: 10.3389/fbinf.2023.1127661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 04/17/2023] [Indexed: 05/31/2023] Open
Abstract
RNA sequencing analysis is an important field in the study of extracellular vesicles (EVs), as these particles contain a variety of RNA species that may have diagnostic, prognostic and predictive value. Many of the bioinformatics tools currently used to analyze EV cargo rely on third-party annotations. Recently, analysis of unannotated expressed RNAs has become of interest, since these may provide complementary information to traditional annotated biomarkers or may help refine biological signatures used in machine learning by including unknown regions. Here we perform a comparative analysis of annotation-free and classical read-summarization tools for the analysis of RNA sequencing data generated for EVs isolated from persons with amyotrophic lateral sclerosis (ALS) and healthy donors. Differential expression analysis and digital-droplet PCR validation of unannotated RNAs also confirmed their existence and demonstrates the usefulness of including such potential biomarkers in transcriptome analysis. We show that find-then-annotate methods perform similarly to standard tools for the analysis of known features, and can also identify unannotated expressed RNAs, two of which were validated as overexpressed in ALS samples. We demonstrate that these tools can therefore be used for a stand-alone analysis or easily integrated into current workflows and may be useful for re-analysis as annotations can be integrated post hoc.
Collapse
Affiliation(s)
| | - Eric P. Allain
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Clinical Genetics, Vitalité Health Network, Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | - Jeremy W. Roy
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | | | - Daniel Saucier
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
| | - Pier Morin
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
| | - Alier Marrero
- Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
| | | | - Anirban Ghosh
- Atlantic Cancer Research Institute, Moncton, NB, Canada
| | - Stephen M. Lewis
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | - Rodney J. Ouellette
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
- Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
| | | |
Collapse
|
5
|
Identification of Known and Novel Arundo donax L. MicroRNAs and Their Targets Using High-Throughput Sequencing and Degradome Analysis. Life (Basel) 2022; 12:life12050651. [PMID: 35629319 PMCID: PMC9142972 DOI: 10.3390/life12050651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022] Open
Abstract
MicroRNAs (miRNAs) are a class of non-coding molecules involved in the regulation of a variety of biological processes. They have been identified and characterized in several plant species, but only limited data are available for Arundo donax L., one of the most promising bioenergy crops. Here we identified, for the first time, A. donax conserved and novel miRNAs together with their targets, through a combined analysis of high-throughput sequencing of small RNAs, transcriptome and degradome data. A total of 134 conserved miRNAs, belonging to 45 families, and 27 novel miRNA candidates were identified, along with the corresponding primary and precursor miRNA sequences. A total of 96 targets, 69 for known miRNAs and 27 for novel miRNA candidates, were also identified by degradome analysis and selected slice sites were validated by 5′-RACE. The identified set of conserved and novel candidate miRNAs, together with their targets, extends our knowledge about miRNAs in monocots and pave the way to further investigations on miRNAs-mediated regulatory processes in A. donax, Poaceae and other bioenergy crops.
Collapse
|
6
|
Chao H, Hu Y, Zhao L, Xin S, Ni Q, Zhang P, Chen M. Biogenesis, Functions, Interactions, and Resources of Non-Coding RNAs in Plants. Int J Mol Sci 2022; 23:ijms23073695. [PMID: 35409060 PMCID: PMC8998614 DOI: 10.3390/ijms23073695] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/19/2022] [Accepted: 03/23/2022] [Indexed: 12/14/2022] Open
Abstract
Plant transcriptomes encompass a large number of functional non-coding RNAs (ncRNAs), only some of which have protein-coding capacity. Since their initial discovery, ncRNAs have been classified into two broad categories based on their biogenesis and mechanisms of action, housekeeping ncRNAs and regulatory ncRNAs. With advances in RNA sequencing technology and computational methods, bioinformatics resources continue to emerge and update rapidly, including workflow for in silico ncRNA analysis, up-to-date platforms, databases, and tools dedicated to ncRNA identification and functional annotation. In this review, we aim to describe the biogenesis, biological functions, and interactions with DNA, RNA, protein, and microorganism of five major regulatory ncRNAs (miRNA, siRNA, tsRNA, circRNA, lncRNA) in plants. Then, we systematically summarize tools for analysis and prediction of plant ncRNAs, as well as databases. Furthermore, we discuss the silico analysis process of these ncRNAs and present a protocol for step-by-step computational analysis of ncRNAs. In general, this review will help researchers better understand the world of ncRNAs at multiple levels.
Collapse
Affiliation(s)
| | | | | | | | | | - Peijing Zhang
- Correspondence: (P.Z.); (M.C.); Tel./Fax: +86-(0)571-88206612 (M.C.)
| | - Ming Chen
- Correspondence: (P.Z.); (M.C.); Tel./Fax: +86-(0)571-88206612 (M.C.)
| |
Collapse
|
7
|
Bell J, Hendrix DA. Predicting Drosha and Dicer Cleavage Sites with DeepMirCut. Front Mol Biosci 2022; 8:799056. [PMID: 35141278 PMCID: PMC8819831 DOI: 10.3389/fmolb.2021.799056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022] Open
Abstract
MicroRNAs are a class of small RNAs involved in post-transcriptional gene silencing with roles in disease and development. Many computational tools have been developed to identify novel microRNAs. However, there have been no attempts to predict cleavage sites for Drosha from primary sequence, or to identify cleavage sites using deep neural networks. Here, we present DeepMirCut, a recurrent neural network-based software that predicts both Dicer and Drosha cleavage sites. We built a microRNA primary sequence database including flanking genomic sequences for 34,713 microRNA annotations. We compare models trained on sequence data, sequence and secondary structure data, as well as input data with annotated structures. Our best model is able to predict cuts within closer average proximity than results reported for other methods. We show that a guanine nucleotide before and a uracil nucleotide after Dicer cleavage sites on the 3' arm of the microRNA precursor had a positive effect on predictions while the opposite order (U before, G after) had a negative effect. Our analysis was also able to predict several positions where bulges had either positive or negative effects on the score. We expect that our approach and the data we have curated will enable several future studies.
Collapse
Affiliation(s)
- Jimmy Bell
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, United States
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, United States
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
8
|
Jeseničnik T, Štajner N, Radišek S, Mishra AK, Košmelj K, Kunej U, Jakše J. Discovery of microRNA-like Small RNAs in Pathogenic Plant Fungus Verticillium nonalfalfae Using High-Throughput Sequencing and qPCR and RLM-RACE Validation. Int J Mol Sci 2022; 23:900. [PMID: 35055083 PMCID: PMC8778906 DOI: 10.3390/ijms23020900] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 02/06/2023] Open
Abstract
Verticillium nonalfalfae (V. nonalfalfae) is one of the most problematic hop (Humulus lupulus L.) pathogens, as the highly virulent fungal pathotypes cause severe annual yield losses due to infections of entire hop fields. In recent years, the RNA interference (RNAi) mechanism has become one of the main areas of focus in plant-fungal pathogen interaction studies and has been implicated as one of the major contributors to fungal pathogenicity. MicroRNA-like RNAs (milRNAs) have been identified in several important plant pathogenic fungi; however, to date, no milRNA has been reported in the V. nonalfalfae species. In the present study, using a high-throughput sequencing approach and extensive bioinformatics analysis, a total of 156 milRNA precursors were identified in the annotated V. nonalfalfae genome, and 27 of these milRNA precursors were selected as true milRNA candidates, with appropriate microRNA hairpin secondary structures. The stem-loop RT-qPCR assay was used for milRNA validation; a total of nine V. nonalfalfae milRNAs were detected, and their expression was confirmed. The milRNA expression patterns, determined by the absolute quantification approach, imply that milRNAs play an important role in the pathogenicity of highly virulent V. nonalfalfae pathotypes. Computational analysis predicted milRNA targets in the V. nonalfalfae genome and in the host hop transcriptome, and the activity of milRNA-mediated RNAi target cleavage was subsequently confirmed for two selected endogenous fungal target gene models using the 5' RLM-RACE approach.
Collapse
Affiliation(s)
- Taja Jeseničnik
- Department of Agronomy, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia; (T.J.); (N.Š.); (K.K.); (U.K.)
| | - Nataša Štajner
- Department of Agronomy, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia; (T.J.); (N.Š.); (K.K.); (U.K.)
| | - Sebastjan Radišek
- Plant Protection Department, Slovenian Institute of Hop Research and Brewing, 3310 Žalec, Slovenia;
| | - Ajay Kumar Mishra
- Biology Centre of the Czech Academy of Sciences, Department of Molecular Genetics, Institute of Plant Molecular Biology, Branišovská 31, 37005 České Budějovice, Czech Republic;
| | - Katarina Košmelj
- Department of Agronomy, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia; (T.J.); (N.Š.); (K.K.); (U.K.)
| | - Urban Kunej
- Department of Agronomy, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia; (T.J.); (N.Š.); (K.K.); (U.K.)
| | - Jernej Jakše
- Department of Agronomy, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia; (T.J.); (N.Š.); (K.K.); (U.K.)
| |
Collapse
|
9
|
Saçar Demirci MD. Computational Detection of Pre-microRNAs. Methods Mol Biol 2022; 2257:167-174. [PMID: 34432278 DOI: 10.1007/978-1-0716-1170-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
MicroRNA (miRNA) studies have been one of the most popular research areas in recent years. Although thousands of miRNAs have been detected in several species, the majority remains unidentified. Thus, finding novel miRNAs is a vital element for investigating miRNA mediated posttranscriptional gene regulation machineries. Furthermore, experimental methods have challenging inadequacies in their capability to detect rare miRNAs, and are also limited to the state of the organism under examination (e.g., tissue type, developmental stage, stress-disease conditions). These issues have initiated the creation of high-level computational methodologies endeavoring to distinguish potential miRNAs in silico. On the other hand, most of these tools suffer from high numbers of false positives and/or false negatives and as a result they do not provide enough confidence for validating all their predictions experimentally. In this chapter, computational difficulties in detection of pre-miRNAs are discussed and a machine learning based approach that has been designed to address these issues is reviewed.
Collapse
|
10
|
Garg V, Varshney RK. Analysis of Small RNA Sequencing Data in Plants. Methods Mol Biol 2022; 2443:497-509. [PMID: 35037223 DOI: 10.1007/978-1-0716-2067-0_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Over the past decades, next-generation sequencing (NGS) has been employed extensively for investigating the regulatory mechanisms of small RNAs. Several bioinformatics tools are available for aiding biologists to extract meaningful information from enormous amounts of data generated by NGS platforms. This chapter describes a detailed methodology for analyzing small RNA sequencing data using different open source tools. We elaborate on various steps involved in analysis, from processing the raw sequencing reads to identifying miRNAs, their targets, and differential expression studies.
Collapse
Affiliation(s)
- Vanika Garg
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India.
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia.
| |
Collapse
|
11
|
PlantMirP2: An Accurate, Fast and Easy-To-Use Program for Plant Pre-miRNA and miRNA Prediction. Genes (Basel) 2021; 12:genes12081280. [PMID: 34440454 PMCID: PMC8392394 DOI: 10.3390/genes12081280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 01/01/2023] Open
Abstract
MicroRNAs (miRNAs) are a kind of short non-coding ribonucleic acid molecules that can regulate gene expression. The computational identification of plant miRNAs is of great significance to understanding biological functions. In our previous studies, we have put firstly forward and further developed a set of knowledge-based energy features to construct two plant pre-miRNA prediction tools (plantMirP and riceMirP). However, these two tools cannot be used for miRNA prediction from NGS (Next-Generation Sequencing) data. In addition, for further improving the prediction performance and accessibility, plantMirP2 has been developed. Based on the latest dataset, plantMirP2 achieves a promising performance: 0.9968 (Area Under Curve, AUC), 0.9754 (accuracy), 0.9675 (sensitivity) and 0.9876 (specificity). Additionally, the comparisons with other plant pre-miRNA tools show that plantMirP2 performs better. Finally, the webserver and stand-alone version of plantMirP2 are available.
Collapse
|
12
|
Zhao Y, Kuang Z, Wang Y, Li L, Yang X. MicroRNA annotation in plants: current status and challenges. Brief Bioinform 2021; 22:6180404. [PMID: 33754625 DOI: 10.1093/bib/bbab075] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/01/2021] [Accepted: 02/15/2021] [Indexed: 11/14/2022] Open
Abstract
Last two decades, the studies on microRNAs (miRNAs) and the numbers of annotated miRNAs in plants and animals have surged. Herein, we reviewed the current progress and challenges of miRNA annotation in plants. Via the comparison of plant and animal miRNAs, we pinpointed out the difficulties on plant miRNA annotation and proposed potential solutions. In terms of recalling the history of methods and criteria in plant miRNA annotation, we detailed how the major progresses made and evolved. By collecting and categorizing bioinformatics tools for plant miRNA annotation, we surveyed their advantages and disadvantages, especially for ones with the principle of mimicking the miRNA biogenesis pathway by parsing deeply sequenced small RNA (sRNA) libraries. In addition, we summarized all available databases hosting plant miRNAs, and posted the potential optimization solutions such as how to increase the signal-to-noise ratio (SNR) in these databases. Finally, we discussed the challenges and perspectives of plant miRNA annotations, and indicated the possibilities offered by an all-in-one tool and platform according to the integration of artificial intelligence.
Collapse
Affiliation(s)
- Yongxin Zhao
- Beijing Academy of Agriculture and Forestry Sciences, China
| | - Zheng Kuang
- Peking University and Beijing Academy of Agriculture and Forestry Sciences, China
| | | | - Lei Li
- School of Advanced Agricultural Sciences and School of Life Sciences at the Peking University, China
| | - Xiaozeng Yang
- Beijing Academy of Agriculture and Forestry Sciences, China
| |
Collapse
|
13
|
MicroRNAs Regulating Autophagy in Neurodegeneration. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1208:191-264. [PMID: 34260028 DOI: 10.1007/978-981-16-2830-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Social and economic impacts of neurodegenerative diseases (NDs) become more prominent in our constantly aging population. Currently, due to the lack of knowledge about the aetiology of most NDs, only symptomatic treatment is available for patients. Hence, researchers and clinicians are in need of solid studies on pathological mechanisms of NDs. Autophagy promotes degradation of pathogenic proteins in NDs, while microRNAs post-transcriptionally regulate multiple signalling networks including autophagy. This chapter will critically discuss current research advancements in the area of microRNAs regulating autophagy in NDs. Moreover, we will introduce basic strategies and techniques used in microRNA research. Delineation of the mechanisms contributing to NDs will result in development of better approaches for their early diagnosis and effective treatment.
Collapse
|
14
|
"Mind the Gap": Hi-C Technology Boosts Contiguity of the Globe Artichoke Genome in Low-Recombination Regions. G3-GENES GENOMES GENETICS 2020; 10:3557-3564. [PMID: 32817122 PMCID: PMC7534446 DOI: 10.1534/g3.120.401446] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Globe artichoke (Cynara cardunculus var. scolymus; 2n2x=34) is cropped largely in the Mediterranean region, being Italy the leading world producer; however, over time, its cultivation has spread to the Americas and China. In 2016, we released the first (v1.0) globe artichoke genome sequence (http://www.artichokegenome.unito.it/). Its assembly was generated using ∼133-fold Illumina sequencing data, covering 725 of the 1,084 Mb genome, of which 526 Mb (73%) were anchored to 17 chromosomal pseudomolecules. Based on v1.0 sequencing data, we generated a new genome assembly (v2.0), obtained from a Hi-C (Dovetail) genomic library, and which improves the scaffold N50 from 126 kb to 44.8 Mb (∼356-fold increase) and N90 from 29 kb to 17.8 Mb (∼685-fold increase). While the L90 of the v1.0 sequence included 6,123 scaffolds, the new v2.0 just 15 super-scaffolds, a number close to the haploid chromosome number of the species. The newly generated super-scaffolds were assigned to pseudomolecules using reciprocal blast procedures. The cumulative size of unplaced scaffolds in v2.0 was reduced of 165 Mb, increasing to 94% the anchored genome sequence. The marked improvement is mainly attributable to the ability of the proximity ligation-based approach to deal with both heterochromatic (e.g.: peri-centromeric) and euchromatic regions during the assembly procedure, which allowed to physically locate low recombination regions. The new high-quality reference genome enhances the taxonomic breadth of the data available for comparative plant genomics and led to a new accurate gene prediction (28,632 genes), thus promoting the map-based cloning of economically important genes.
Collapse
|
15
|
Bugnon LA, Yones C, Milone DH, Stegmayer G. Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning. Brief Bioinform 2020; 22:5894456. [PMID: 34020552 DOI: 10.1093/bib/bbaa184] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/13/2020] [Accepted: 07/18/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. RESULTS In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. AVAILABILITY The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Cristian Yones
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
16
|
Volk MJ, Lourentzou I, Mishra S, Vo LT, Zhai C, Zhao H. Biosystems Design by Machine Learning. ACS Synth Biol 2020; 9:1514-1533. [PMID: 32485108 DOI: 10.1021/acssynbio.0c00129] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
Collapse
|
17
|
Whole genome resequencing of four Italian sweet pepper landraces provides insights on sequence variation in genes of agronomic value. Sci Rep 2020; 10:9189. [PMID: 32514106 PMCID: PMC7280500 DOI: 10.1038/s41598-020-66053-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 05/07/2020] [Indexed: 11/08/2022] Open
Abstract
Sweet pepper (Capsicum annuum L.) is a high value crop and one of the most widely grown vegetables belonging to the Solanaceae family. In addition to commercial varieties and F1 hybrids, a multitude of landraces are grown, whose genetic combination is the result of hundreds of years of random, environmental, and farmer selection. High genetic diversity exists in the landrace gene pool which however has scarcely been studied, thus bounding their cultivation. We re-sequenced four pepper inbred lines, within as many Italian landraces, which representative of as many fruit types: big sized blocky with sunken apex ('Quadrato') and protruding apex or heart shaped ('Cuneo'), elongated ('Corno') and smaller sized sub-spherical ('Tumaticot'). Each genomic sequence was obtained through Illumina platform at coverage ranging from 39 to 44×, and reconstructed at a chromosome scale. About 35.5k genes were predicted in each inbred line, of which 22,017 were shared among them and the reference genome (accession 'CM334'). Distinctive variations in miRNAs, resistance gene analogues (RGAs) and susceptibility genes (S-genes) were detected. A detailed survey of the SNP/Indels occurring in genes affecting fruit size, shape and quality identified the highest frequencies of variation in regulatory regions. Many structural variations were identified as presence/absence variations (PAVs), notably in resistance gene analogues (RGAs) and in the capsanthin/capsorubin synthase (CCS) gene. The large allelic diversity observed in the four inbred lines suggests their potential use as a pre-breeding resource and represents a one-stop resource for C. annuum genomics and a key tool for dissecting the path from sequence variation to phenotype.
Collapse
|
18
|
Raad J, Stegmayer G, Milone DH. Complexity measures of the mature miRNA for improving pre-miRNAs prediction. Bioinformatics 2019; 36:2319-2327. [DOI: 10.1093/bioinformatics/btz940] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 12/10/2019] [Accepted: 12/17/2019] [Indexed: 12/20/2022] Open
Abstract
AbstractMotivationThe discovery of microRNA (miRNA) in the last decade has certainly changed the understanding of gene regulation in the cell. Although a large number of algorithms with different features have been proposed, they still predict an impractical amount of false positives. Most of the proposed features are based on the structure of precursors of the miRNA only, not considering the important and relevant information contained in the mature miRNA. Such new kind of features could certainly improve the performance of the predictors of new miRNAs.ResultsThis paper presents three new features that are based on the sequence information contained in the mature miRNA. We will show how these new features, when used by a classical supervised machine learning approach as well as by more recent proposals based on deep learning, improve the prediction performance in a significant way. Moreover, several experimental conditions were defined and tested to evaluate the novel features impact in situations close to genome-wide analysis. The results show that the incorporation of new features based on the mature miRNA allows to improve the detection of new miRNAs independently of the classifier used.Availability and implementationhttps://sourceforge.net/projects/sourcesinc/files/cplxmirna/.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
19
|
Mármol-Sánchez E, Cirera S, Quintanilla R, Pla A, Amills M. Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach. Genomics 2019; 112:2107-2118. [PMID: 31816430 DOI: 10.1016/j.ygeno.2019.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 11/13/2019] [Accepted: 12/05/2019] [Indexed: 12/15/2022]
Abstract
Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.
Collapse
Affiliation(s)
- Emilio Mármol-Sánchez
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.
| | - Susanna Cirera
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 3, 2nd Floor, 1870 Frederiksberg C, Denmark
| | - Raquel Quintanilla
- Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Albert Pla
- Department of Medical Genetics, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Marcel Amills
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain; Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
20
|
Wang H, Ma Y, Dong C, Li C, Wang J, Liu D. CL-PMI: A Precursor MicroRNA Identification Method Based on Convolutional and Long Short-Term Memory Networks. Front Genet 2019; 10:967. [PMID: 31681416 PMCID: PMC6798641 DOI: 10.3389/fgene.2019.00967] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 09/10/2019] [Indexed: 12/23/2022] Open
Abstract
MicroRNAs (miRNAs) are the major class of gene-regulating molecules that bind mRNAs. They function mainly as translational repressors in mammals. Therefore, how to identify miRNAs is one of the most important problems in medical treatment. Many known pre-miRNAs have a hairpin ring structure containing more structural features, and it is difficult to identify mature miRNAs because of their short length. Therefore, most research focuses on the identification of pre-miRNAs. Most computational models rely on manual feature extraction to identify pre-miRNAs and do not consider the sequential and spatial characteristics of pre-miRNAs, resulting in a loss of information. As the number of unidentified pre-miRNAs is far greater than that of known pre-miRNAs, there is a dataset imbalance problem, which leads to a degradation of the performance of pre-miRNA identification methods. In order to overcome the limitations of existing methods, we propose a pre-miRNA identification algorithm based on a cascaded CNN-LSTM framework, called CL-PMI. We used a convolutional neural network to automatically extract features and obtain pre-miRNA spatial information. We also employed long short-term memory (LSTM) to capture time characteristics of pre-miRNAs and improve attention mechanisms for long-term dependence modeling. Focal loss was used to improve the dataset imbalance. Compared with existing methods, CL-PMI achieved better performance on all datasets. The results demonstrate that this method can effectively identify pre-miRNAs by simultaneously considering their spatial and sequential information, as well as dealing with imbalance in the datasets.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Yue Ma
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Chunlin Dong
- Dryland Agriculture Research Center, Shanxi Academy of Agricultural Sciences, Taiyuan, China
| | - Chun Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Jingjing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Dan Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
21
|
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. PLoS Comput Biol 2019; 15:e1007309. [PMID: 31596843 PMCID: PMC6785219 DOI: 10.1371/journal.pcbi.1007309] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 08/05/2019] [Indexed: 12/29/2022] Open
Abstract
MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome. While the computational prediction of microRNA loci from high-throughput sequence data is well-studied, challenges persist in defining the minimum number of reads required for a locus to be evaluated, as well as in defining the precursor span. We present a new method, “miRWoods”, which has greater recall of known microRNAs, while also achieving as good or better overall performance. Our approach uses improved duplex-based methods of precursor detection and a pair of random forest layers that sensitively detect mature products and precursors. We trained our model on data from human, and confirmed that it can successfully be applied cross-species by evaluating predictions for the mouse genome. We then applied our approach to new sequencing data mapped to the under-annotated genomes of cow and cat. We were able to use miRWoods to improve annotations for cat and cow microRNAs, and found novel microRNAs in human and mouse, and identified errors in current annotations.
Collapse
|
22
|
Chen L, Heikkinen L, Wang C, Yang Y, Sun H, Wong G. Trends in the development of miRNA bioinformatics tools. Brief Bioinform 2019; 20:1836-1852. [PMID: 29982332 PMCID: PMC7414524 DOI: 10.1093/bib/bby054] [Citation(s) in RCA: 415] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 05/18/2018] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression via recognition of cognate sequences and interference of transcriptional, translational or epigenetic processes. Bioinformatics tools developed for miRNA study include those for miRNA prediction and discovery, structure, analysis and target prediction. We manually curated 95 review papers and ∼1000 miRNA bioinformatics tools published since 2003. We classified and ranked them based on citation number or PageRank score, and then performed network analysis and text mining (TM) to study the miRNA tools development trends. Five key trends were observed: (1) miRNA identification and target prediction have been hot spots in the past decade; (2) manual curation and TM are the main methods for collecting miRNA knowledge from literature; (3) most early tools are well maintained and widely used; (4) classic machine learning methods retain their utility; however, novel ones have begun to emerge; (5) disease-associated miRNA tools are emerging. Our analysis yields significant insight into the past development and future directions of miRNA tools.
Collapse
Affiliation(s)
- Liang Chen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Liisa Heikkinen
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Changliang Wang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Yang Yang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| | - Huiyan Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R, China
| |
Collapse
|
23
|
Barchi L, Pietrella M, Venturini L, Minio A, Toppino L, Acquadro A, Andolfo G, Aprea G, Avanzato C, Bassolino L, Comino C, Molin AD, Ferrarini A, Maor LC, Portis E, Reyes-Chin-Wo S, Rinaldi R, Sala T, Scaglione D, Sonawane P, Tononi P, Almekias-Siegl E, Zago E, Ercolano MR, Aharoni A, Delledonne M, Giuliano G, Lanteri S, Rotino GL. A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution. Sci Rep 2019; 9:11769. [PMID: 31409808 PMCID: PMC6692341 DOI: 10.1038/s41598-019-47985-w] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 07/05/2019] [Indexed: 11/30/2022] Open
Abstract
With approximately 450 species, spiny Solanum species constitute the largest monophyletic group in the Solanaceae family, but a high-quality genome assembly from this group is presently missing. We obtained a chromosome-anchored genome assembly of eggplant (Solanum melongena), containing 34,916 genes, confirming that the diploid gene number in the Solanaceae is around 35,000. Comparative genomic studies with tomato (S. lycopersicum), potato (S. tuberosum) and pepper (Capsicum annuum) highlighted the rapid evolution of miRNA:mRNA regulatory pairs and R-type defense genes in the Solanaceae, and provided a genomic basis for the lack of steroidal glycoalkaloid compounds in the Capsicum genus. Using parsimony methods, we reconstructed the putative chromosomal complements of the key founders of the main Solanaceae clades and the rearrangements that led to the karyotypes of extant species and their ancestors. From 10% to 15% of the genes present in the four genomes were syntenic paralogs (ohnologs) generated by the pre-γ, γ and T paleopolyploidy events, and were enriched in transcription factors. Our data suggest that the basic gene network controlling fruit ripening is conserved in different Solanaceae clades, and that climacteric fruit ripening involves a differential regulation of relatively few components of this network, including CNR and ethylene biosynthetic genes.
Collapse
Affiliation(s)
- Lorenzo Barchi
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy
| | - Marco Pietrella
- Italian National Agency for New Technologies, Energy and Sustainable Development (ENEA), Casaccia Res Ctr, Via Anguillarese 301, 00123, Roma, Italy.,Council for Agricultural Research and Economics (CREA), Research Centre for Olive, Citrus and Tree Fruit, 47121, Forlì, Italy
| | - Luca Venturini
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy.,Department of Life Sciences, Natural History Museum, Cromwell Rd, Kensington, London, United Kingdom
| | - Andrea Minio
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Laura Toppino
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics, 26836, Montanaso Lombardo, LO, Italy
| | - Alberto Acquadro
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy
| | - Giuseppe Andolfo
- Department of Agricultural Sciences, University of Naples Federico II, 80055, Portici, Italy
| | - Giuseppe Aprea
- Italian National Agency for New Technologies, Energy and Sustainable Development (ENEA), Casaccia Res Ctr, Via Anguillarese 301, 00123, Roma, Italy
| | - Carla Avanzato
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Laura Bassolino
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics, 26836, Montanaso Lombardo, LO, Italy
| | - Cinzia Comino
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy
| | - Alessandra Dal Molin
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Alberto Ferrarini
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Louise Chappell Maor
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Ezio Portis
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy
| | - Sebastian Reyes-Chin-Wo
- UC Davis Genome Center-GBSF, 451 Health Sciences Drive, University of California, Davis, CA, 95616, USA
| | - Riccardo Rinaldi
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy
| | - Tea Sala
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics, 26836, Montanaso Lombardo, LO, Italy
| | - Davide Scaglione
- IGA Technology Services, Via J. Linussio, 51, 33100, Udine, Italy
| | - Prashant Sonawane
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Paola Tononi
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | - Efrat Almekias-Siegl
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Elisa Zago
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy
| | | | - Asaph Aharoni
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Massimo Delledonne
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134, Verona, Italy.
| | - Giovanni Giuliano
- Italian National Agency for New Technologies, Energy and Sustainable Development (ENEA), Casaccia Res Ctr, Via Anguillarese 301, 00123, Roma, Italy.
| | - Sergio Lanteri
- University of Torino - DISAFA - Plant Genetics and Breeding, Largo Braccini 2, 10095, Grugliasco, Torino, Italy.
| | - Giuseppe Leonardo Rotino
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics, 26836, Montanaso Lombardo, LO, Italy
| |
Collapse
|
24
|
Computational Resources for Prediction and Analysis of Functional miRNA and Their Targetome. Methods Mol Biol 2019; 1912:215-250. [PMID: 30635896 DOI: 10.1007/978-1-4939-8982-9_9] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
microRNAs are evolutionarily conserved, endogenously produced, noncoding RNAs (ncRNAs) of approximately 19-24 nucleotides (nts) in length known to exhibit gene silencing of complementary target sequence. Their deregulated expression is reported in various disease conditions and thus has therapeutic implications. In the last decade, various computational resources are published in this field. In this chapter, we have reviewed bioinformatics resources, i.e., miRNA-centered databases, algorithms, and tools to predict miRNA targets. First section has enlisted more than 75 databases, which mainly covers information regarding miRNA registries, targets, disease associations, differential expression, interactions with other noncoding RNAs, and all-in-one resources. In the algorithms section, we have compiled about 140 algorithms from eight subcategories, viz. for the prediction of precursor (pre-) and mature miRNAs. These algorithms are developed on various sequence, structure, and thermodynamic based features incorporated into different machine learning techniques (MLTs). In addition, computational identification of miRNAs from high-throughput next generation sequencing (NGS) data and their variants, viz. isomiRs, differential expression, miR-SNPs, and functional annotation, are discussed. Prediction and analysis of miRNAs and their associated targets are also evaluated under miR-targets section providing knowledge regarding novel miRNA targets and complex host-pathogen interactions. In conclusion, we have provided comprehensive review of in silico resources published in miRNA research to help scientific community be updated and choose the appropriate tool according to their needs.
Collapse
|
25
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
26
|
Yu D, Wan Y, Ito H, Ma X, Xie T, Wang T, Shao C, Meng Y. PmiRDiscVali: an integrated pipeline for plant microRNA discovery and validation. BMC Genomics 2019; 20:133. [PMID: 30760208 PMCID: PMC6375137 DOI: 10.1186/s12864-019-5478-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 01/24/2019] [Indexed: 11/10/2022] Open
Abstract
Background MicroRNAs (miRNAs) constitute a well-known small RNA (sRNA) species with important regulatory roles. To date, several bioinformatics tools have been developed for large-scale prediction of miRNAs based on high-throughput sequencing data. However, some of these tools become invalid without reference genomes, while some tools cannot supply user-friendly outputs. Besides, most of the current tools focus on the importance of secondary structures and sRNA expression patterns for miRNA prediction, while they do not pay attention to miRNA processing for reliability check. Results Here, we reported a pipeline PmiRDiscVali for plant miRNA discovery and partial validation. This pipeline integrated the popular tool miRDeep-P for plant miRNA prediction, making PmiRDiscVali compatible for both reference-based and de novo predictions. To check the prediction reliability, we adopted the concept that the miRNA processing intermediates could be tracked by degradome sequencing (degradome-seq) during the development of PmiRDiscVali. A case study was performed by using the public sequencing data of Dendrobium officinale, in order to show the clear and concise presentation of the prediction results. Conclusion Summarily, the integrated pipeline PmiRDiscVali, featured with degradome-seq data-based validation and vivid result presentation, should be useful for large-scale identification of plant miRNA candidates. Electronic supplementary material The online version of this article (10.1186/s12864-019-5478-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dongliang Yu
- College of Life and Environmental Sciences, Hangzhou Normal University, Xuelin Street 16#, Xiasha, Hangzhou, 310036, People's Republic of China
| | - Ying Wan
- College of Life and Environmental Sciences, Hangzhou Normal University, Xuelin Street 16#, Xiasha, Hangzhou, 310036, People's Republic of China
| | - Hidetaka Ito
- Faculty of Science, Hokkaido University, Kita10 Nishi8, Kita-ku, Sapporo, Hokkaido, 060-0810, Japan
| | - Xiaoxia Ma
- College of Life and Environmental Sciences, Hangzhou Normal University, Xuelin Street 16#, Xiasha, Hangzhou, 310036, People's Republic of China
| | - Tian Xie
- Holistic Integrative Pharmacy Institutes, Hangzhou Normal University, Wenyixi Road 1378#, Hangzhou, 311121, People's Republic of China.
| | - Tingzhang Wang
- Key Laboratory of microbiological technology and Bioinformatics in Zhejiang Province, Hangzhou, 310036, People's Republic of China
| | - Chaogang Shao
- College of Life Sciences, Huzhou University, Huzhou, 313000, People's Republic of China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Xuelin Street 16#, Xiasha, Hangzhou, 310036, People's Republic of China.
| |
Collapse
|
27
|
Armenta-Medina A, Gillmor CS. An Introduction to Methods for Discovery and Functional Analysis of MicroRNAs in Plants. Methods Mol Biol 2019; 1932:1-14. [PMID: 30701488 DOI: 10.1007/978-1-4939-9042-9_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
MicroRNAs play important roles in posttranscriptional regulation of plant development, metabolism, and abiotic stress responses. The recent generation of massive amounts of small RNA sequence data, along with development of bioinformatic tools to identify miRNAs and their mRNA targets, has led to an explosion of newly identified putative miRNAs in plants. Genome editing techniques like CRISPR-Cas9 will allow us to study the biological role of these potential novel miRNAs by efficiently targeting both the miRNA and its mRNA target. In this chapter, we review bioinformatic tools and experimental methods for the identification and functional characterization of miRNAs and their target mRNAs in plants.
Collapse
Affiliation(s)
- Alma Armenta-Medina
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Unidad de Genómica Avanzada, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Irapuato, Guanajuato, Mexico
| | - C Stewart Gillmor
- Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Unidad de Genómica Avanzada, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-IPN), Irapuato, Guanajuato, Mexico.
| |
Collapse
|
28
|
Saçar Demirci MD, Yousef M, Allmer J. Computational Prediction of Functional MicroRNA-mRNA Interactions. Methods Mol Biol 2019; 1912:175-196. [PMID: 30635894 DOI: 10.1007/978-1-4939-8982-9_7] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many target mRNAs and an mRNA can be targeted by many miRNAs which makes it difficult to experimentally discover all miRNA-mRNA interactions. Therefore, computational methods have been developed for miRNA detection and miRNA target prediction. An abundance of available computational tools makes selection difficult. Additionally, interactions are not currently the focus of investigation although they more accurately define the regulation than pre-miRNA detection or target prediction could perform alone. We define an interaction including the miRNA source and the mRNA target. We present computational methods allowing the investigation of these interactions as well as how they can be used to extend regulatory pathways. Finally, we present a list of points that should be taken into account when investigating miRNA-mRNA interactions. In the future, this may lead to better understanding of functional interactions which may pave the way for disease marker discovery and design of miRNA-based drugs.
Collapse
Affiliation(s)
| | - Malik Yousef
- Department of Community Information Systems, Zefat Academic College, Zefat, Israel
| | - Jens Allmer
- Applied Bioinformatics, Bioscience, Wageningen University & Research, Wageningen, The Netherlands.
| |
Collapse
|
29
|
Devi K, Dey KK, Singh S, Mishra SK, Modi MK, Sen P. Identification and validation of plant miRNA from NGS data—an experimental approach. Brief Funct Genomics 2018; 18:13-22. [DOI: 10.1093/bfgp/ely034] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 09/17/2018] [Accepted: 10/02/2018] [Indexed: 12/18/2022] Open
Affiliation(s)
- Kamalakshi Devi
- Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
| | - Kuntal Kumar Dey
- Distributed Information Centre, Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
| | - Sanjay Singh
- Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
| | | | - Mahendra Kumar Modi
- Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
- Distributed Information Centre, Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
| | - Priyabrata Sen
- Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat, India
| |
Collapse
|
30
|
Bisgin H, Gong B, Wang Y, Tong W. Evaluation of Bioinformatics Approaches for Next-Generation Sequencing Analysis of microRNAs with a Toxicogenomics Study Design. Front Genet 2018; 9:22. [PMID: 29467792 PMCID: PMC5808213 DOI: 10.3389/fgene.2018.00022] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 01/17/2018] [Indexed: 12/18/2022] Open
Abstract
MicroRNAs (miRNAs) are key post-transcriptional regulators that affect protein translation by targeting mRNAs. Their role in disease etiology and toxicity are well recognized. Given the rapid advancement of next-generation sequencing techniques, miRNA profiling has been increasingly conducted with RNA-seq, namely miRNA-seq. Analysis of miRNA-seq data requires several steps: (1) mapping the reads to miRBase, (2) considering mismatches during the hairpin alignment (windowing), and (3) counting the reads (quantification). The choice made in each step with respect to the parameter settings could affect miRNA quantification, differentially expressed miRNAs (DEMs) detection and novel miRNA identification. Furthermore, these parameters do not act in isolation and their joint effects impact miRNA-seq results and interpretation. In toxicogenomics, the variation associated with parameter setting should not overpower the treatment effect (such as the dose/time-dependent effect). In this study, four commonly used miRNA-seq analysis tools (i.e., miRDeep2, miRExpress, miRNAkey, sRNAbench) were comparatively evaluated with a standard toxicogenomics study design. We tested 30 different parameter settings on miRNA-seq data generated from thioacetamide-treated rat liver samples for three dose levels across four time points, followed by four normalization options. Because both miRExpress and miRNAkey yielded larger variation than that of the treatment effects across multiple parameter settings, our analyses mainly focused on the side-by-side comparison between miRDeep2 and sRNAbench. While the number of miRNAs detected by miRDeep2 was almost the subset of those detected by sRNAbench, the number of DEMs identified by both tools was comparable under the same parameter settings and normalization method. Change in the number of nucleotides out of the mature sequence in the hairpin alignment (window option) yielded the largest variation for miRNA quantification and DEMs detection. However, such a variation is relatively small compared to the treatment effect when the study focused on DEMs that are more critical to interpret the toxicological effect. While the normalization methods introduced a large variation in DEMs, toxic behavior of thioacetamide showed consistency in the trend of time-dose responses. Overall, miRDeep2 was found to be preferable over other choices when the window option allowed up to three nucleotides from both ends.
Collapse
Affiliation(s)
- Halil Bisgin
- Department of Computer Science, Engineering, and Physics, University of Michigan-Flint, Flint, MI, United States
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (FDA), Jefferson, AR, United States
| | - Yuping Wang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (FDA), Jefferson, AR, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research (FDA), Jefferson, AR, United States
| |
Collapse
|
31
|
Vitsios DM, Kentepozidou E, Quintais L, Benito-Gutiérrez E, van Dongen S, Davis MP, Enright AJ. Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests. Nucleic Acids Res 2017; 45:e177. [PMID: 29036314 PMCID: PMC5716205 DOI: 10.1093/nar/gkx836] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 09/20/2017] [Indexed: 12/21/2022] Open
Abstract
The discovery of microRNAs (miRNAs) remains an important problem, particularly given the growth of high-throughput sequencing, cell sorting and single cell biology. While a large number of miRNAs have already been annotated, there may well be large numbers of miRNAs that are expressed in very particular cell types and remain elusive. Sequencing allows us to quickly and accurately identify the expression of known miRNAs from small RNA-Seq data. The biogenesis of miRNAs leads to very specific characteristics observed in their sequences. In brief, miRNAs usually have a well-defined 5′ end and a more flexible 3′ end with the possibility of 3′ tailing events, such as uridylation. Previous approaches to the prediction of novel miRNAs usually involve the analysis of structural features of miRNA precursor hairpin sequences obtained from genome sequence. We surmised that it may be possible to identify miRNAs by using these biogenesis features observed directly from sequenced reads, solely or in addition to structural analysis from genome data. To this end, we have developed mirnovo, a machine learning based algorithm, which is able to identify known and novel miRNAs in animals and plants directly from small RNA-Seq data, with or without a reference genome. This method performs comparably to existing tools, however is simpler to use with reduced run time. Its performance and accuracy has been tested on multiple datasets, including species with poorly assembled genomes, RNaseIII (Drosha and/or Dicer) deficient samples and single cells (at both embryonic and adult stage).
Collapse
Affiliation(s)
- Dimitrios M Vitsios
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elissavet Kentepozidou
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leonor Quintais
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elia Benito-Gutiérrez
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Stijn van Dongen
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew P Davis
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anton J Enright
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
32
|
Xia J, Li L, Li T, Fang Z, Zhang K, Zhou J, Peng H, Zhang W. Detecting and characterizing microRNAs of diverse genomic origins via miRvial. Nucleic Acids Res 2017; 45:e176. [PMID: 29036674 PMCID: PMC5716067 DOI: 10.1093/nar/gkx834] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 09/12/2017] [Indexed: 12/11/2022] Open
Abstract
MicroRNAs form an essential class of post-transcriptional gene regulator of eukaryotic species, and play critical parts in development and disease and stress responses. MicroRNAs may originate from various genomic loci, have structural characteristics, and appear in canonical or modified forms, making them subtle to detect and analyze. We present miRvial, a robust computational method and companion software package that supports parameter adjustment and visual inspection of candidate microRNAs. Extensive results comparing miRvial and six existing microRNA finding methods on six model organisms, Mus musculus, Drosophila melanogaste, Arabidopsis thaliana, Oryza sativa, Physcomitrella patens and Chlamydomonas reinhardtii, demonstrated the utility and rigor of miRvial in detecting novel microRNAs and characterizing features of microRNAs. Experimental validation of several novel microRNAs in C. reinhardtii that were predicted by miRvial but missed by the other methods illustrated the superior performance of miRvial over the existing methods. miRvial is open source and available at https://github.com/SystemsBiologyOfJianghanUniversity/miRvial.
Collapse
Affiliation(s)
- Jing Xia
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China.,Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Lun Li
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| | - Tiantian Li
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| | - Zhiwei Fang
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| | - Kevin Zhang
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China.,Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Junfei Zhou
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| | - Hai Peng
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China
| | - Weixiong Zhang
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China.,Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63130, USA
| |
Collapse
|
33
|
Varicella-Zoster Virus Expresses Multiple Small Noncoding RNAs. J Virol 2017; 91:JVI.01710-17. [PMID: 29021397 DOI: 10.1128/jvi.01710-17] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 10/03/2017] [Indexed: 12/11/2022] Open
Abstract
Many herpesviruses express small noncoding RNAs (sncRNAs), including microRNAs (miRNAs), that may play roles in regulating lytic and latent infections. None have yet been reported in varicella-zoster virus (VZV; also known as human herpesvirus 3 [HHV-3]). Here we analyzed next-generation sequencing (NGS) data for small RNAs in VZV-infected fibroblasts and human embryonic stem cell-derived (hESC) neurons. Two independent bioinformatics analyses identified more than 20 VZV-encoded 20- to 24-nucleotide RNAs, some of which are predicted to have stem-loop precursors potentially representing miRNAs. These sequences are perfectly conserved between viruses from three clades of VZV. One NGS-identified sequence common to both bioinformatics analyses mapped to the repeat regions of the VZV genome, upstream of the predicted promoter of the immediate early gene open reading frame 63 (ORF63). This miRNA candidate was detected in each of 3 independent biological repetitions of NGS of RNA from fibroblasts and neurons productively infected with VZV using TaqMan quantitative PCR (qPCR). Importantly, transfected synthetic RNA oligonucleotides antagonistic to the miRNA candidate significantly enhanced VZV plaque growth rates. The presence of 6 additional small noncoding RNAs was also verified by TaqMan qPCR in productively infected fibroblasts and ARPE19 cells. Our results show VZV, like other human herpesviruses, encodes several sncRNAs and miRNAs, and some may regulate infection of host cells.IMPORTANCE Varicella-zoster virus is an important human pathogen, with herpes zoster being a major health issue in the aging and immunocompromised populations. Small noncoding RNAs (sncRNAs) are recognized as important actors in modulating gene expression, and this study demonstrates the first reported VZV-encoded sncRNAs. Many are clustered to a small genomic region, as seen in other human herpesviruses. At least one VZV sncRNA was expressed in productive infection of neurons and fibroblasts that is likely to reduce viral replication. Since sncRNAs have been suggested to be potential targets for antiviral therapies, identification of these molecules in VZV may provide a new direction for development of treatments for painful herpes zoster.
Collapse
|
34
|
Bortolomeazzi M, Gaffo E, Bortoluzzi S. A survey of software tools for microRNA discovery and characterization using RNA-seq. Brief Bioinform 2017; 20:918-930. [DOI: 10.1093/bib/bbx148] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/12/2017] [Indexed: 01/08/2023] Open
Affiliation(s)
| | - Enrico Gaffo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | | |
Collapse
|
35
|
Banerjee S, Sirohi A, Ansari AA, Gill SS. Role of small RNAs in abiotic stress responses in plants. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.plgene.2017.04.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
36
|
Paicu C, Mohorianu I, Stocks M, Xu P, Coince A, Billmeier M, Dalmay T, Moulton V, Moxon S. miRCat2: accurate prediction of plant and animal microRNAs from next-generation sequencing datasets. Bioinformatics 2017; 33:2446-2454. [PMID: 28407097 PMCID: PMC5870699 DOI: 10.1093/bioinformatics/btx210] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 03/28/2017] [Accepted: 04/10/2017] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION MicroRNAs are a class of ∼21-22 nt small RNAs which are excised from a stable hairpin-like secondary structure. They have important gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in eukaryotes. There are several computational tools for miRNA detection from next-generation sequencing datasets. However, many of these tools suffer from high false positive and false negative rates. Here we present a novel miRNA prediction algorithm, miRCat2. miRCat2 incorporates a new entropy-based approach to detect miRNA loci, which is designed to cope with the high sequencing depth of current next-generation sequencing datasets. It has a user-friendly interface and produces graphical representations of the hairpin structure and plots depicting the alignment of sequences on the secondary structure. RESULTS We test miRCat2 on a number of animal and plant datasets and present a comparative analysis with miRCat, miRDeep2, miRPlant and miReap. We also use mutants in the miRNA biogenesis pathway to evaluate the predictions of these tools. Results indicate that miRCat2 has an improved accuracy compared with other methods tested. Moreover, miRCat2 predicts several new miRNAs that are differentially expressed in wild-type versus mutants in the miRNA biogenesis pathway. AVAILABILITY AND IMPLEMENTATION miRCat2 is part of the UEA small RNA Workbench and is freely available from http://srna-workbench.cmp.uea.ac.uk/. CONTACT v.moulton@uea.ac.uk or s.moxon@uea.ac.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Paicu
- The Earlham Institute, Norwich Research Park, Norwich, UK
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Irina Mohorianu
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Matthew Stocks
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Ping Xu
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Aurore Coince
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Martina Billmeier
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Simon Moxon
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, UK
| |
Collapse
|
37
|
Genome reconstruction in Cynara cardunculus taxa gains access to chromosome-scale DNA variation. Sci Rep 2017; 7:5617. [PMID: 28717205 PMCID: PMC5514137 DOI: 10.1038/s41598-017-05085-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 05/24/2017] [Indexed: 11/12/2022] Open
Abstract
The genome sequence of globe artichoke (Cynara cardunculus L. var. scolymus, 2n = 2x = 34) is now available for use. A survey of C. cardunculus genetic resources is essential for understanding the evolution of the species, carrying out genetic studies and for application of breeding strategies. We report on the resequencing analyses (~35×) of four globe artichoke genotypes, representative of the core varietal types, as well as a genotype of the related taxa cultivated cardoon. The genomes were reconstructed at a chromosomal scale and structurally/functionally annotated. Gene prediction indicated a similar number of genes, while distinctive variations in miRNAs and resistance gene analogues (RGAs) were detected. Overall, 23,5 M SNP/indel were discovered (range 6,34 M –14,50 M). The impact of some missense SNPs on the biological functions of genes involved in the biosynthesis of phenylpropanoid and sesquiterpene lactone secondary metabolites was predicted. The identified variants contribute to infer on globe artichoke domestication of the different varietal types, and represent key tools for dissecting the path from sequence variation to phenotype. The new genomic sequences are fully searchable through independent Jbrowse interfaces (www.artichokegenome.unito.it), which allow the analysis of collinearity and the discovery of genomic variants, thus representing a one-stop resource for C. cardunculus genomics.
Collapse
|
38
|
Computational Approaches and Related Tools to Identify MicroRNAs in a Species: A Bird’s Eye View. Interdiscip Sci 2017; 10:616-635. [DOI: 10.1007/s12539-017-0223-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 12/20/2016] [Accepted: 03/09/2017] [Indexed: 12/26/2022]
|
39
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
40
|
Xue B, Lipps D, Devineni S. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset. PLoS One 2016; 11:e0168392. [PMID: 28002428 PMCID: PMC5176297 DOI: 10.1371/journal.pone.0168392] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 11/29/2016] [Indexed: 01/08/2023] Open
Abstract
MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at:https://github.com/xueLab/mirMeta
Collapse
Affiliation(s)
- Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
- * E-mail:
| | - David Lipps
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
| | - Sree Devineni
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
| |
Collapse
|
41
|
Gonzalez-Ibeas D, Martinez-Garcia PJ, Famula RA, Delfino-Mix A, Stevens KA, Loopstra CA, Langley CH, Neale DB, Wegrzyn JL. Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana). G3 (BETHESDA, MD.) 2016; 6:3787-3802. [PMID: 27799338 PMCID: PMC5144951 DOI: 10.1534/g3.116.032805] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 07/13/2016] [Indexed: 02/06/2023]
Abstract
Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers.
Collapse
Affiliation(s)
- Daniel Gonzalez-Ibeas
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut 06269
| | | | - Randi A Famula
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Annette Delfino-Mix
- United States Department of Agriculture Forest Service, Institute of Forest Genetics, Placerville, California 95667
| | - Kristian A Stevens
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Carol A Loopstra
- Department of Ecosystem Science and Management, Texas A&M University, College Station, Texas 77843
| | - Charles H Langley
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - David B Neale
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut 06269
| |
Collapse
|
42
|
Martínez-García PJ, Crepeau MW, Puiu D, Gonzalez-Ibeas D, Whalen J, Stevens KA, Paul R, Butterfield TS, Britton MT, Reagan RL, Chakraborty S, Walawage SL, Vasquez-Gross HA, Cardeno C, Famula RA, Pratt K, Kuruganti S, Aradhya MK, Leslie CA, Dandekar AM, Salzberg SL, Wegrzyn JL, Langley CH, Neale DB. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016; 87:507-32. [PMID: 27145194 DOI: 10.1111/tpj.13207] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 04/22/2016] [Accepted: 04/27/2016] [Indexed: 05/18/2023]
Abstract
The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.
Collapse
Affiliation(s)
| | - Marc W Crepeau
- Department of Evolution and Ecology, University of California, Davis, CA, 95616, USA
| | - Daniela Puiu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Daniel Gonzalez-Ibeas
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | - Jeanne Whalen
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | - Kristian A Stevens
- Department of Evolution and Ecology, University of California, Davis, CA, 95616, USA
| | - Robin Paul
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | | | | | - Russell L Reagan
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Sandeep Chakraborty
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Sriema L Walawage
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | | | - Charis Cardeno
- Department of Evolution and Ecology, University of California, Davis, CA, 95616, USA
| | - Randi A Famula
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Kevin Pratt
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | - Sowmya Kuruganti
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | | | - Charles A Leslie
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Abhaya M Dandekar
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269-3043, USA
| | - Charles H Langley
- Department of Evolution and Ecology, University of California, Davis, CA, 95616, USA
| | - David B Neale
- Department of Plant Sciences, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
43
|
Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA (NEW YORK, N.Y.) 2016; 22:1120-38. [PMID: 27284164 PMCID: PMC4931105 DOI: 10.1261/rna.055509.115] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 05/04/2016] [Indexed: 05/26/2023]
Abstract
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing.
Collapse
Affiliation(s)
- Mark Ziemann
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| | - Antony Kaspi
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| | - Assam El-Osta
- Epigenetics in Human Health and Disease Laboratory, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, AustraliaEpigenomics Profiling Facility, Baker IDI Heart and Diabetes Institute, The Alfred Medical Research and Education Precinct, Melbourne, Victoria 3004, Australia
| |
Collapse
|
44
|
Urgese G, Paciello G, Acquaviva A, Ficarra E. isomiR-SEA: an RNA-Seq analysis tool for miRNAs/isomiRs expression level profiling and miRNA-mRNA interaction sites evaluation. BMC Bioinformatics 2016; 17:148. [PMID: 27036505 PMCID: PMC4815201 DOI: 10.1186/s12859-016-0958-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 02/19/2016] [Indexed: 01/01/2023] Open
Abstract
Background Massive parallel sequencing of transcriptomes, revealed the presence of many miRNAs and miRNAs variants named isomiRs with a potential role in several cellular processes through their interaction with a target mRNA. Many methods and tools have been recently devised to detect and quantify miRNAs from sequencing data. However, all of them are implemented on top of general purpose alignment methods, thus providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites. Results To overcome these limitations we present a novel algorithm named isomiR-SEA, that is able to provide users with very accurate miRNAs expression levels and both isomiRs and miRNA-mRNA interaction sites precise classifications. Tags are mapped on the known miRNAs sequences thanks to a specialized alignment algorithm developed on top of biological evidence concerning miRNAs structure. Specifically, isomiR-SEA checks for miRNA seed presence in the input tags and evaluates, during all the alignment phases, the positions of the encountered mismatches, thus allowing to distinguish among the different isomiRs and conserved miRNA-mRNA interaction sites. Conclusions isomiR-SEA performances have been assessed on two public RNA-Seq datasets proving that the implemented algorithm is able to account for more reliable and accurate miRNAs expression levels with respect to those provided by two compared state of the art tools. Moreover, differently from the few methods currently available to perform isomiRs detection, the proposed algorithm implements the evaluation of isomiRs and conserved miRNA-mRNA interaction sites already in the first alignment phases, thus avoiding any additional filtering stages potentially responsible for the loss of useful information. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0958-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gianvito Urgese
- Department of Control and Computer Engineering DAUIN, Politecnico di Torino,, C.so Duca degli Abruzzi 24, Turin, 10129, IT, Italy.
| | - Giulia Paciello
- Department of Control and Computer Engineering DAUIN, Politecnico di Torino,, C.so Duca degli Abruzzi 24, Turin, 10129, IT, Italy
| | - Andrea Acquaviva
- Department of Control and Computer Engineering DAUIN, Politecnico di Torino,, C.so Duca degli Abruzzi 24, Turin, 10129, IT, Italy
| | - Elisa Ficarra
- Department of Control and Computer Engineering DAUIN, Politecnico di Torino,, C.so Duca degli Abruzzi 24, Turin, 10129, IT, Italy
| |
Collapse
|
45
|
Kwon YS, Song H. Analysis of microRNAs in a knock-in hESC line expressing epitope-tagged AGO2. Anim Cells Syst (Seoul) 2016. [DOI: 10.1080/19768354.2015.1137227] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
46
|
The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny. Sci Rep 2016; 6:19427. [PMID: 26786968 PMCID: PMC4726258 DOI: 10.1038/srep19427] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 12/11/2015] [Indexed: 12/13/2022] Open
Abstract
Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species.
Collapse
|
47
|
Yu L, Shao C, Ye X, Meng Y, Zhou Y, Chen M. miRNA Digger: a comprehensive pipeline for genome-wide novel miRNA mining. Sci Rep 2016; 6:18901. [PMID: 26732371 PMCID: PMC4702050 DOI: 10.1038/srep18901] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 11/27/2015] [Indexed: 11/09/2022] Open
Abstract
MicroRNAs (miRNAs) are important regulators of gene expression. The recent advances in high-throughput sequencing (HTS) technique have greatly facilitated large-scale detection of the miRNAs. However, thoroughly discovery of novel miRNAs from the available HTS data sets remains a major challenge. In this study, we observed that Dicer-mediated cleavage sites for the processing of the miRNA precursors could be mapped by using degradome sequencing data in both animals and plants. In this regard, a novel tool, miRNA Digger, was developed for systematical discovery of miRNA candidates through genome-wide screening of cleavage signals based on degradome sequencing data. To test its sensitivity and reliability, miRNA Digger was applied to discover miRNAs from four organs of Arabidopsis. The results revealed that a majority of already known mature miRNAs along with their miRNA*s expressed in these four organs were successfully recovered. Notably, a total of 30 novel miRNA-miRNA* pairs that have not been registered in miRBase were discovered by miRNA Digger. After target prediction and degradome sequencing data-based validation, eleven miRNA-target interactions involving six of the novel miRNAs were identified. Taken together, miRNA Digger could be applied for sensitive detection of novel miRNAs and it could be freely downloaded from http://www.bioinfolab.cn/miRNA_Digger/index.html.
Collapse
Affiliation(s)
- Lan Yu
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Chaogang Shao
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Xinghuo Ye
- College of Life Sciences, Huzhou University, Huzhou 313000, P.R. China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou 310036, P.R. China
| | - Yincong Zhou
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
48
|
Roth W, Hecker D, Fava E. Systems Biology Approaches to the Study of Biological Networks Underlying Alzheimer's Disease: Role of miRNAs. Methods Mol Biol 2016; 1303:349-377. [PMID: 26235078 DOI: 10.1007/978-1-4939-2627-5_21] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
MicroRNAs (miRNAs) are emerging as significant regulators of mRNA complexity in the human central nervous system (CNS) thereby controlling distinct gene expression profiles in a spatio-temporal manner during development, neuronal plasticity, aging and (age-related) neurodegeneration, including Alzheimer's disease (AD). Increasing effort is expended towards dissecting and deciphering the molecular and genetic mechanisms of neurobiological and pathological functions of these brain-enriched miRNAs. Along these lines, recent data pinpoint distinct miRNAs and miRNA networks being linked to APP splicing, processing and Aβ pathology (Lukiw et al., Front Genet 3:327, 2013), and furthermore, to the regulation of tau and its cellular subnetworks (Lau et al., EMBO Mol Med 5:1613, 2013), altogether underlying the onset and propagation of Alzheimer's disease. MicroRNA profiling studies in Alzheimer's disease suffer from poor consensus which is an acknowledged concern in the field, and constitutes one of the current technical challenges. Hence, a strong demand for experimental and computational systems biology approaches arises, to incorporate and integrate distinct levels of information and scientific knowledge into a complex system of miRNA networks in the context of the transcriptome, proteome and metabolome in a given cellular environment. Here, we will discuss the state-of-the-art technologies and computational approaches on hand that may lead to a deeper understanding of the complex biological networks underlying the pathogenesis of Alzheimer's disease.
Collapse
Affiliation(s)
- Wera Roth
- German Center for Neurodegenerative Diseases (DZNE), Ludwig-Erhard-Allee 2, 53175, Bonn, Germany
| | | | | |
Collapse
|
49
|
Alptekin B, Akpinar BA, Budak H. A Comprehensive Prescription for Plant miRNA Identification. FRONTIERS IN PLANT SCIENCE 2016; 7:2058. [PMID: 28174574 PMCID: PMC5258749 DOI: 10.3389/fpls.2016.02058] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/23/2016] [Indexed: 05/15/2023]
Abstract
microRNAs (miRNAs) are tiny ribo-regulatory molecules involved in various essential pathways for persistence of cellular life, such as development, environmental adaptation, and stress response. In recent years, miRNAs have become a major focus in molecular biology because of their functional and diagnostic importance. This interest in miRNA research has resulted in the development of many specific software and pipelines for the identification of miRNAs and their specific targets, which is the key for the elucidation of miRNA-modulated gene expression. While the well-recognized importance of miRNAs in clinical research pushed the emergence of many useful computational identification approaches in animals, available software and pipelines are fewer for plants. Additionally, existing approaches suffers from mis-identification and annotation of plant miRNAs since the miRNA mining process for plants is highly prone to false-positives, particularly in cereals which have a highly repetitive genome. Our group developed a homology-based in silico miRNA identification approach for plants, which utilizes two Perl scripts "SUmirFind" and "SUmirFold" and since then, this method helped identify many miRNAs particularly from crop species such as Triticum or Aegliops. Herein, we describe a comprehensive updated guideline by the implementation of two new scripts, "SUmirPredictor" and "SUmirLocator," and refinements to our previous method in order to identify genuine miRNAs with increased sensitivity in consideration of miRNA identification problems in plants. Recent updates enable our method to provide more reliable and precise results in an automated fashion in addition to solutions for elimination of most false-positive predictions, miRNA naming and miRNA mis-annotation. It also provides a comprehensive view to genome/transcriptome-wide location of miRNA precursors as well as their association with transposable elements. The "SUmirPredictor" and "SUmirLocator" scripts are freely available together with a reference high-confidence plant miRNA list.
Collapse
Affiliation(s)
- Burcu Alptekin
- Cereal Genomics Lab, Department of Plant Sciences and Plant Pathology, Montana State UniversityBozeman, MT, USA
| | - Bala A. Akpinar
- Sabanci University Nanotechnology Research and Application Centre, Sabanci UniversityIstanbul, Turkey
| | - Hikmet Budak
- Cereal Genomics Lab, Department of Plant Sciences and Plant Pathology, Montana State UniversityBozeman, MT, USA
- *Correspondence: Hikmet Budak
| |
Collapse
|
50
|
Pian C, Zhang J, Chen YY, Chen Z, Li Q, Li Q, Zhang LY. OP-Triplet-ELM: Identification of real and pseudo microRNA precursors using extreme learning machine with optimal features. J Bioinform Comput Biol 2015; 14:1650006. [PMID: 26707924 DOI: 10.1142/s0219720016500062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
MicroRNAs (miRNAs) are a set of short (21-24 nt) non-coding RNAs that play significant regulatory roles in the cells. Triplet-SVM-classifier and MiPred (random forest, RF) can identify the real pre-miRNAs from other hairpin sequences with similar stem-loop (pseudo pre-miRNAs). However, the 32-dimensional local contiguous structure-sequence can induce a great information redundancy. Therefore, it is essential to develop a method to reduce the dimension of feature space. In this paper, we propose optimal features of local contiguous structure-sequences (OP-Triplet). These features can avoid the information redundancy effectively and decrease the dimension of the feature vector from 32 to 8. Meanwhile, a hybrid feature can be formed by combining minimum free energy (MFE) and structural diversity. We also introduce a neural network algorithm called extreme learning machine (ELM). The results show that the specificity ([Formula: see text])and sensitivity ([Formula: see text]) of our method are 92.4% and 91.0%, respectively. Compared with Triplet-SVM-classifier, the total accuracy (ACC) of our ELM method increases by 5%. Compared with MiPred (RF) and miRANN, the total accuracy (ACC) of our ELM method increases nearly by 2%. What is more, our method commendably reduces the dimension of the feature space and the training time.
Collapse
Affiliation(s)
- Cong Pian
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Jin Zhang
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Yuan-Yuan Chen
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Zhi Chen
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Qin Li
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Qiang Li
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| | - Liang-Yun Zhang
- 1 College of Science, Nanjing Agricultural, University, Nanjing 210095, P. R. China
| |
Collapse
|