1
|
Tian XC, Chen ZY, Nie S, Shi TL, Yan XM, Bao YT, Li ZC, Ma HY, Jia KH, Zhao W, Mao JF. Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification. HORTICULTURE RESEARCH 2024; 11:uhae041. [PMID: 38638682 PMCID: PMC11024640 DOI: 10.1093/hr/uhae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/02/2024] [Indexed: 04/20/2024]
Abstract
Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.
Collapse
Affiliation(s)
- Xue-Chan Tian
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhao-Yang Chen
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Shuai Nie
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China
| | - Tian-Le Shi
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xue-Mei Yan
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yu-Tao Bao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Zhi-Chao Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hai-Yao Ma
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Kai-Hua Jia
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Wei Zhao
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| | - Jian-Feng Mao
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- Department of Plant Physiology, Umeå Plant Science Centre (UPSC), Umeå University, Umeå 90187, Sweden
| |
Collapse
|
2
|
Yadav VK, Jalmi SK, Tiwari S, Kerkar S. Deciphering shared attributes of plant long non-coding RNAs through a comparative computational approach. Sci Rep 2023; 13:15101. [PMID: 37699996 PMCID: PMC10497521 DOI: 10.1038/s41598-023-42420-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/10/2023] [Indexed: 09/14/2023] Open
Abstract
Over the past decade, long non-coding RNA (lncRNA), which lacks protein-coding potential, has emerged as an essential regulator of the genome. The present study examined 13,599 lncRNAs in Arabidopsis thaliana, 11,565 in Oryza sativa, and 32,397 in Zea mays for their characteristic features and explored the associated genomic and epigenomic features. We found lncRNAs were distributed throughout the chromosomes and the Helitron family of transposable elements (TEs) enriched, while the terminal inverted repeat depleted in lncRNA transcribing regions. Our analyses determined that lncRNA transcribing regions show rare or weak signals for most epigenetic marks except for H3K9me2 and cytosine methylation in all three plant species. LncRNAs showed preferential localization in the nucleus and cytoplasm; however, the distribution ratio in the cytoplasm and nucleus varies among the studied plant species. We identified several conserved endogenous target mimic sites in the lncRNAs among the studied plants. We found 233, 301, and 273 unique miRNAs, potentially targeting the lncRNAs of A. thaliana, O. sativa, and Z. mays, respectively. Our study has revealed that miRNAs, which interact with lncRNAs, target genes that are involved in a diverse array of biological and molecular processes. The miRNA-targeted lncRNAs displayed a strong affinity for several transcription factors, including ERF and BBR-BPC, mutually present in all three plants, advocating their conserved functions. Overall, the present study showed that plant lncRNAs exhibit conserved genomic and epigenomic characteristics and potentially govern the growth and development of plants.
Collapse
Affiliation(s)
- Vikash Kumar Yadav
- School of Biological Sciences and Biotechnology, Goa University, Taleigao Plateau, Goa, 403206, India.
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India.
| | - Siddhi Kashinath Jalmi
- School of Biological Sciences and Biotechnology, Goa University, Taleigao Plateau, Goa, 403206, India
| | - Shalini Tiwari
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, 74078, OK, USA
| | - Savita Kerkar
- School of Biological Sciences and Biotechnology, Goa University, Taleigao Plateau, Goa, 403206, India
| |
Collapse
|
3
|
Singh A, AT V, Gupta K, Sharma S, Kumar S. Long non-coding RNA and microRNA landscape of two major domesticated cotton species. Comput Struct Biotechnol J 2023; 21:3032-3044. [PMID: 37266406 PMCID: PMC10229759 DOI: 10.1016/j.csbj.2023.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open
Abstract
Allotetraploid cotton plants Gossypium hirsutum and Gossypium barbadense have been widely cultivated for their natural, renewable textile fibres. Even though ncRNAs in domesticated cotton species have been extensively studied, systematic identification and annotation of lncRNAs and miRNAs expressed in various tissues and developmental stages under various biological contexts are limited. This influences the comprehension of their functions and future research on these cotton species. Here, we report high confidence lncRNAs and miRNA collection from G. hirsutum accession and G. barbadense accession using large-scale RNA-seq and small RNA-seq datasets incorporated into a user-friendly database, CoNCRAtlas. This database provides a wide range and depth of lncRNA and miRNA annotation based on the systematic integration of extensive annotations such as expression patterns derived from transcriptome data analysis in thousands of samples, as well as multi-omics annotations. We assume this comprehensive resource will accelerate evolutionary and functional studies in ncRNAs and inform future breeding programs for cotton improvement. CoNCRAtlas is accessible at http://www.nipgr.ac.in/CoNCRAtlas/.
Collapse
Affiliation(s)
- Ajeet Singh
- Bioinformatics Lab, National Institute of Plant Genome Research, New Delhi 110067, India
- Postdoctoral Associate, Ophthalmology, Baylor College of Medicine, Houston, TX, USA
| | - Vivek AT
- Bioinformatics Lab, National Institute of Plant Genome Research, New Delhi 110067, India
| | - Kanika Gupta
- Bioinformatics Lab, National Institute of Plant Genome Research, New Delhi 110067, India
| | - Shruti Sharma
- Bioinformatics Lab, National Institute of Plant Genome Research, New Delhi 110067, India
| | - Shailesh Kumar
- Bioinformatics Lab, National Institute of Plant Genome Research, New Delhi 110067, India
| |
Collapse
|
4
|
Mustafin RN, Khusnutdinova E. Perspective for Studying the Relationship of miRNAs with Transposable Elements. Curr Issues Mol Biol 2023; 45:3122-3145. [PMID: 37185728 PMCID: PMC10136691 DOI: 10.3390/cimb45040204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 03/07/2023] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
Transposable elements are important sources of miRNA, long non-coding RNAs genes, and their targets in the composition of protein-coding genes in plants and animals. Therefore, the detection of expression levels of specific non-coding RNAs in various tissues and cells in normal and pathological conditions may indicate a programmed pattern of transposable elements' activation. This reflects the species-specific composition and distribution of transposable elements in genomes, which underlie gene regulation in every cell division, including during aging. TEs' expression is also regulated by epigenetic factors (DNA methylation, histone modifications), SIRT6, cytidine deaminases APOBEC3, APOBEC1, and other catalytic proteins, such as ERCC, TREX1, RB1, HELLS, and MEGP2. In evolution, protein-coding genes and their regulatory elements are derived from transposons. As part of non-coding regions and introns of genes, they are sensors for transcriptional and post-transcriptional control of expression, using miRNAs and long non-coding RNAs, that arose from transposable elements in evolution. Methods (Orbld, ncRNAclassifier) and databases have been created for determining the occurrence of miRNAs from transposable elements in plants (PlanTE-MIR DB, PlaNC-TE), which can be used to design epigenetic gene networks in ontogenesis. Based on the data accumulated in the scientific literature, the presence of 467 transposon-derived miRNA genes in the human genome has been reliably established. It was proposed to create an updated and controlled online bioinformatics database of miRNAs derived from transposable elements in healthy individuals, as well as expression changes of these miRNAs during aging and various diseases, such as cancer and difficult-to-treat diseases. The use of the information obtained can open new horizons in the management of tissue and organ differentiation to aging slow down. In addition, the created database could become the basis for clarifying the mechanisms of pathogenesis of various diseases (imbalance in the activity of transposable elements, reflected in changes in the expression of miRNAs) and designing their targeted therapy using specific miRNAs as targets. This article provides examples of the detection of transposable elements-derived miRNAs involved in the development of specific malignant neoplasms, aging, and idiopathic pulmonary fibrosis.
Collapse
Affiliation(s)
- Rustam Nailevich Mustafin
- Department of Medical Genetics and Fundamental Medicine, Bashkir State Medical University, 450008 Ufa, Russia
| | - Elza Khusnutdinova
- Ufa Federal Research Centre, Institute of Biochemistry and Genetics, Russian Academy of Sciences, 450054 Ufa, Russia
| |
Collapse
|
5
|
Mokhtar MM, Alsamman AM, El Allali A. PlantLTRdb: An interactive database for 195 plant species LTR-retrotransposons. FRONTIERS IN PLANT SCIENCE 2023; 14:1134627. [PMID: 36950350 PMCID: PMC10025401 DOI: 10.3389/fpls.2023.1134627] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/16/2023] [Indexed: 05/29/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.
Collapse
|
6
|
Pegler JL, Oultram JMJ, Mann CWG, Carroll BJ, Grof CPL, Eamens AL. Miniature Inverted-Repeat Transposable Elements: Small DNA Transposons That Have Contributed to Plant MICRORNA Gene Evolution. PLANTS (BASEL, SWITZERLAND) 2023; 12:1101. [PMID: 36903960 PMCID: PMC10004981 DOI: 10.3390/plants12051101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/23/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
Angiosperms form the largest phylum within the Plantae kingdom and show remarkable genetic variation due to the considerable difference in the nuclear genome size of each species. Transposable elements (TEs), mobile DNA sequences that can amplify and change their chromosome position, account for much of the difference in nuclear genome size between individual angiosperm species. Considering the dramatic consequences of TE movement, including the complete loss of gene function, it is unsurprising that the angiosperms have developed elegant molecular strategies to control TE amplification and movement. Specifically, the RNA-directed DNA methylation (RdDM) pathway, directed by the repeat-associated small-interfering RNA (rasiRNA) class of small regulatory RNA, forms the primary line of defense to control TE activity in the angiosperms. However, the miniature inverted-repeat transposable element (MITE) species of TE has at times avoided the repressive effects imposed by the rasiRNA-directed RdDM pathway. MITE proliferation in angiosperm nuclear genomes is due to their preference to transpose within gene-rich regions, a pattern of transposition that has enabled MITEs to gain further transcriptional activity. The sequence-based properties of a MITE results in the synthesis of a noncoding RNA (ncRNA), which, after transcription, folds to form a structure that closely resembles those of the precursor transcripts of the microRNA (miRNA) class of small regulatory RNA. This shared folding structure results in a MITE-derived miRNA being processed from the MITE-transcribed ncRNA, and post-maturation, the MITE-derived miRNA can be used by the core protein machinery of the miRNA pathway to regulate the expression of protein-coding genes that harbor homologous MITE insertions. Here, we outline the considerable contribution that the MITE species of TE have made to expanding the miRNA repertoire of the angiosperms.
Collapse
Affiliation(s)
- Joseph L. Pegler
- Centre for Plant Science, School of Environmental and Life Sciences, College of Engineering, Science and Environment, University of Newcastle, Callaghan, NSW 2308, Australia
| | - Jackson M. J. Oultram
- Centre for Plant Science, School of Environmental and Life Sciences, College of Engineering, Science and Environment, University of Newcastle, Callaghan, NSW 2308, Australia
| | - Christopher W. G. Mann
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Bernard J. Carroll
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Christopher P. L. Grof
- Centre for Plant Science, School of Environmental and Life Sciences, College of Engineering, Science and Environment, University of Newcastle, Callaghan, NSW 2308, Australia
| | - Andrew L. Eamens
- School of Health, University of the Sunshine Coast, Maroochydore, QLD 4558, Australia
| |
Collapse
|
7
|
dos Santos LB, Aono AH, Francisco FR, da Silva CC, Souza LM, de Souza AP. The rubber tree kinome: Genome-wide characterization and insights into coexpression patterns associated with abiotic stress responses. FRONTIERS IN PLANT SCIENCE 2023; 14:1068202. [PMID: 36824205 PMCID: PMC9941580 DOI: 10.3389/fpls.2023.1068202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/18/2023] [Indexed: 06/18/2023]
Abstract
The protein kinase (PK) superfamily constitutes one of the largest and most conserved protein families in eukaryotic genomes, comprising core components of signaling pathways in cell regulation. Despite its remarkable relevance, only a few kinase families have been studied in Hevea brasiliensis. A comprehensive characterization and global expression analysis of the PK superfamily, however, is currently lacking. In this study, with the aim of providing novel inferences about the mechanisms associated with the stress response developed by PKs and retained throughout evolution, we identified and characterized the entire set of PKs, also known as the kinome, present in the Hevea genome. Different RNA-sequencing datasets were employed to identify tissue-specific expression patterns and potential correspondences between different rubber tree genotypes. In addition, coexpression networks under several abiotic stress conditions, such as cold, drought and latex overexploitation, were employed to elucidate associations between families and tissues/stresses. A total of 1,809 PK genes were identified using the current reference genome assembly at the scaffold level, and 1,379 PK genes were identified using the latest chromosome-level assembly and combined into a single set of 2,842 PKs. These proteins were further classified into 20 different groups and 122 families, exhibiting high compositional similarities among family members and with two phylogenetically close species Manihot esculenta and Ricinus communis. Through the joint investigation of tandemly duplicated kinases, transposable elements, gene expression patterns, and coexpression events, we provided insights into the understanding of the cell regulation mechanisms in response to several conditions, which can often lead to a significant reduction in rubber yield.
Collapse
Affiliation(s)
- Lucas Borges dos Santos
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
| | - Felipe Roberto Francisco
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
| | - Carla Cristina da Silva
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
| | - Livia Moura Souza
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
- São Francisco University (USF), Itatiba, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering, State University of Campinas, Campinas, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, Brazil
| |
Collapse
|
8
|
Zhang L, Zhang S, Wang R, Sun L. Genome-Wide Identification of Long Noncoding RNA and Their Potential Interactors in ISWI Mutants. Int J Mol Sci 2022; 23:ijms23116247. [PMID: 35682924 PMCID: PMC9181106 DOI: 10.3390/ijms23116247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 05/28/2022] [Accepted: 05/31/2022] [Indexed: 11/17/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) have been identified as key regulators of gene expression and participate in many vital physiological processes. Chromatin remodeling, being an important epigenetic modification, has been identified in many biological activities as well. However, the regulatory mechanism of lncRNA in chromatin remodeling remains unclear. In order to characterize the genome-wide lncRNA expression and their potential interacting factors during this process in Drosophila, we investigated the expression pattern of lncRNAs and mRNAs based on the transcriptome analyses and found significant differences between lncRNAs and mRNAs. Then, we performed TSA-FISH experiments of candidate lncRNAs and their potential interactors that have different functions in Drosophila embryos to determine their expression pattern. In addition, we also analyzed the expression of transposable elements (TEs) and their interactors to explore their expression in ISWI mutants. Our results provide a new perspective for understanding the possible regulatory mechanism of lncRNAs and TEs as well as their targets in chromatin remodeling.
Collapse
|
9
|
Oliveira LS, Patera AC, Domingues DS, Sanches DS, Lopes FM, Bugatti PH, Saito PTM, Maracaja-Coutinho V, Durham AM, Paschoal AR. Computational Analysis of Transposable Elements and CircRNAs in Plants. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2362:147-172. [PMID: 34195962 DOI: 10.1007/978-1-0716-1645-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This chapter provides two main contributions: (1) a description of computational tools and databases used to identify and analyze transposable elements (TEs) and circRNAs in plants; and (2) data analysis on public TE and circRNA data. Our goal is to highlight the primary information available in the literature on circular noncoding RNAs and transposable elements in plants. The exploratory analysis performed on publicly available circRNA and TEs data help discuss four sequence features. Finally, we investigate the association on circRNAs:TE in plants in the model organism Arabidopsis thaliana.
Collapse
Affiliation(s)
- Liliane Santana Oliveira
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil. .,Embrapa Soja, Londrina, Paraná, Brazil.
| | - Andressa Caroline Patera
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Douglas Silva Domingues
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.,Group of Genomics and Transcriptomes in Plants, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista (UNESP), Rio Claro, SP, Brazil
| | - Danilo Sipoli Sanches
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Fabricio Martins Lopes
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Pedro Henrique Bugatti
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Priscila Tiemi Maeda Saito
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Vinicius Maracaja-Coutinho
- Centro de Modelamiento Molecular, Biofísica y Bioinformática-CM2B2, Facultad de Ciencias Quimicas y Farmaceuticas, Universidad de Chile, Santiago, Chile
| | - Alan Mitchell Durham
- Department of Computer Science, Instituto de Matemática e Estatística, Universidade de São Paulo (USP), Cidade Universitária, SP, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.
| |
Collapse
|
10
|
Zeng C, Takeda A, Sekine K, Osato N, Fukunaga T, Hamada M. Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs. Methods Mol Biol 2022; 2509:315-340. [PMID: 35796972 DOI: 10.1007/978-1-0716-2380-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.
Collapse
Affiliation(s)
- Chao Zeng
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| | - Atsushi Takeda
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Kotaro Sekine
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Naoki Osato
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| |
Collapse
|
11
|
Abstract
In this era of big data, sets of methodologies and strategies are designed to extract knowledge from huge volumes of data. However, the cost of where and how to get this information accurately and quickly is extremely important, given the diversity of genomes and the different ways of representing that information. Among the huge set of information and relationships that the genome carries, there are sequences called miRNAs (microRNAs). These sequences were described in the 1990s and are mainly involved in mechanisms of regulation and gene expression. Having this in mind, this chapter focuses on exploring the available literature and providing useful and practical guidance on the miRNA database and tools topic. For that, we organized and present this text in two ways: (a) the update reviews and articles, which best summarize and discuss the theme; and (b) our update investigation on miRNA literature and portals about databases and tools. Finally, we present the main challenge and a possible solution to improve resources and tools.
Collapse
Affiliation(s)
- Tharcísio Soares de Amorim
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Daniel Longhi Fernandes Pedro
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil.
| |
Collapse
|
12
|
A Practical Guide on Computational Tools and Databases for Transposable Elements in Plants. Methods Mol Biol 2021. [PMID: 33900590 DOI: 10.1007/978-1-0716-1134-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In the age of big data, obtaining precise information about the research topic of interesting is extremely important. Keeping this in mind, this chapter focuses on providing a practical knowledge guide about computational tools and databases of transposable elements (TE) in plants. For that, we organize and present this text in three sections: (1) a discussion about tools and databases on this theme; (2) hands-on of how to use a few of them; (3) an exploratory data analysis on public TE data. Finally, we are going deep to present the main challenges and possible solutions to improve resources and tools.
Collapse
|
13
|
Ariel FD, Manavella PA. When junk DNA turns functional: transposon-derived non-coding RNAs in plants. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:4132-4143. [PMID: 33606874 DOI: 10.1093/jxb/erab073] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/12/2021] [Indexed: 05/05/2023]
Abstract
Transposable elements (TEs) are major contributors to genome complexity in eukaryotes. TE mobilization may cause genome instability, although it can also drive genome diversity throughout evolution. TE transposition may influence the transcriptional activity of neighboring genes by modulating the epigenomic profile of the region or by altering the relative position of regulatory elements. Notably, TEs have emerged in the last few years as an important source of functional long and small non-coding RNAs. A plethora of small RNAs derived from TEs have been linked to the trans regulation of gene activity at the transcriptional and post-transcriptional levels. Furthermore, TE-derived long non-coding RNAs have been shown to modulate gene expression by interacting with protein partners, sequestering active small RNAs, and forming duplexes with DNA or other RNA molecules. In this review, we summarize our current knowledge of the functional and mechanistic paradigms of TE-derived long and small non-coding RNAs and discuss their role in plant development and evolution.
Collapse
Affiliation(s)
- Federico D Ariel
- Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral, 3000 Santa Fe, Argentina
| | - Pablo A Manavella
- Instituto de Agrobiotecnología del Litoral (CONICET-UNL), Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral, 3000 Santa Fe, Argentina
| |
Collapse
|
14
|
Jha UC, Nayyar H, Jha R, Khurshid M, Zhou M, Mantri N, Siddique KHM. Long non-coding RNAs: emerging players regulating plant abiotic stress response and adaptation. BMC PLANT BIOLOGY 2020; 20:466. [PMID: 33046001 PMCID: PMC7549229 DOI: 10.1186/s12870-020-02595-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 08/12/2020] [Indexed: 05/13/2023]
Abstract
BACKGROUND The immobile nature of plants means that they can be frequently confronted by various biotic and abiotic stresses during their lifecycle. Among the various abiotic stresses, water stress, temperature extremities, salinity, and heavy metal toxicity are the major abiotic stresses challenging overall plant growth. Plants have evolved complex molecular mechanisms to adapt under the given abiotic stresses. Long non-coding RNAs (lncRNAs)-a diverse class of RNAs that contain > 200 nucleotides(nt)-play an essential role in plant adaptation to various abiotic stresses. RESULTS LncRNAs play a significant role as 'biological regulators' for various developmental processes and biotic and abiotic stress responses in animals and plants at the transcription, post-transcription, and epigenetic level, targeting various stress-responsive mRNAs, regulatory gene(s) encoding transcription factors, and numerous microRNAs (miRNAs) that regulate the expression of different genes. However, the mechanistic role of lncRNAs at the molecular level, and possible target gene(s) contributing to plant abiotic stress response and adaptation, remain largely unknown. Here, we review various types of lncRNAs found in different plant species, with a focus on understanding the complex molecular mechanisms that contribute to abiotic stress tolerance in plants. We start by discussing the biogenesis, type and function, phylogenetic relationships, and sequence conservation of lncRNAs. Next, we review the role of lncRNAs controlling various abiotic stresses, including drought, heat, cold, heavy metal toxicity, and nutrient deficiency, with relevant examples from various plant species. Lastly, we briefly discuss the various lncRNA databases and the role of bioinformatics for predicting the structural and functional annotation of novel lncRNAs. CONCLUSIONS Understanding the intricate molecular mechanisms of stress-responsive lncRNAs is in its infancy. The availability of a comprehensive atlas of lncRNAs across whole genomes in crop plants, coupled with a comprehensive understanding of the complex molecular mechanisms that regulate various abiotic stress responses, will enable us to use lncRNAs as potential biomarkers for tailoring abiotic stress-tolerant plants in the future.
Collapse
Affiliation(s)
- Uday Chand Jha
- ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, 208024, India.
| | - Harsh Nayyar
- Department of Botany, Panjab University, Chandigarh, India
| | - Rintu Jha
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Muhammad Khurshid
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- Institute of Biochemistry and Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Meiliang Zhou
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Nitin Mantri
- School of Science, RMIT University, Plenty Road, Bundoora. Victoria. 3083., Australia
| | - Kadambot H M Siddique
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, 6001, Australia.
| |
Collapse
|
15
|
Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes (Basel) 2020. [DOI: 10.3390/pr8060638] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.
Collapse
|
16
|
Regulatory networks of circRNAs related to transcription factors in Populus euphratica Oliv. heteromorphic leaves. Biosci Rep 2019; 39:221382. [PMID: 31790153 PMCID: PMC6911160 DOI: 10.1042/bsr20190540] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 11/06/2019] [Accepted: 12/02/2019] [Indexed: 12/29/2022] Open
Abstract
Circular RNAs (circRNAs) are a novel class of non-coding RNAs that are characterized by a covalently closed circular structure. They have been widely found in Populus euphratica Oliv. heteromorphic leaves (P. hl). To study the role of circRNAs related to transcription factors (TFs) in the morphogenesis of P. hl, the expression profiles of circRNAs in linear, lanceolate, ovate, and broad-ovate leaves of P. euphratica were elucidated by strand-specific sequencing. We identified and characterized 22 circRNAs related to TFs in P. hl at the four developmental stages. Using the competing endogenous RNAs hypothesis as a guide, we constructed circRNA-miRNA-TF mRNA regulatory networks, which indicated that circRNAs antagonized microRNAs (miRNAs), thereby influencing the expression of the miRNA target genes and playing a significant role in transcriptional regulation. Gene ontology annotation of the target TF genes predicted that these circRNAs were associated mainly with the regulation of leaf development, leaf morphogenesis, signal transduction, and response to abiotic stress. These findings implied that the circRNAs affected the size and number of cells in P. hl by regulating the expression of TF mRNAs. Our results provide a basis for further studies of leaf development in poplar trees.
Collapse
|