1
|
Hassan AH, Mokhtar MM, El Allali A. Transposable elements: multifunctional players in the plant genome. FRONTIERS IN PLANT SCIENCE 2024; 14:1330127. [PMID: 38239225 PMCID: PMC10794571 DOI: 10.3389/fpls.2023.1330127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]
Abstract
Transposable elements (TEs) are indispensable components of eukaryotic genomes that play diverse roles in gene regulation, recombination, and environmental adaptation. Their ability to mobilize within the genome leads to gene expression and DNA structure changes. TEs serve as valuable markers for genetic and evolutionary studies and facilitate genetic mapping and phylogenetic analysis. They also provide insight into how organisms adapt to a changing environment by promoting gene rearrangements that lead to new gene combinations. These repetitive sequences significantly impact genome structure, function and evolution. This review takes a comprehensive look at TEs and their applications in biotechnology, particularly in the context of plant biology, where they are now considered "genomic gold" due to their extensive functionalities. The article addresses various aspects of TEs in plant development, including their structure, epigenetic regulation, evolutionary patterns, and their use in gene editing and plant molecular markers. The goal is to systematically understand TEs and shed light on their diverse roles in plant biology.
Collapse
Affiliation(s)
- Asmaa H. Hassan
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agriculture Research Center, Giza, Egypt
| | - Morad M. Mokhtar
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agriculture Research Center, Giza, Egypt
| | - Achraf El Allali
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| |
Collapse
|
2
|
Mokhtar MM, El Allali A. MegaLTR: a web server and standalone pipeline for detecting and annotating LTR-retrotransposons in plant genomes. FRONTIERS IN PLANT SCIENCE 2023; 14:1237426. [PMID: 37810401 PMCID: PMC10552921 DOI: 10.3389/fpls.2023.1237426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/21/2023] [Indexed: 10/10/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a class of RNA-replicating transposon elements (TEs) that can alter genome structure and function by moving positions, repositioning genes, shifting exons, and causing chromosomal rearrangements. LTR-RTs are widespread in many plant genomes and constitute a significant portion of the genome. Their movement and activity in eukaryotic genomes can provide insight into genome evolution and gene function, especially when LTR-RTs are located near or within genes. Building the redundant and non-redundant LTR-RTs libraries and their annotations for species lacking this resource requires extensive bioinformatics pipelines and expensive computing power to analyze large amounts of genomic data. This increases the need for online services that provide computational resources with minimal overhead and maximum efficiency. Here, we present MegaLTR as a web server and standalone pipeline that detects intact LTR-RTs at the whole-genome level and integrates multiple tools for structure-based, homologybased, and de novo identification, classification, annotation, insertion time determination, and LTR-RT gene chimera analysis. MegaLTR also provides statistical analysis and visualization with multiple tools and can be used to accelerate plant species discovery and assist breeding programs in their efforts to improve genomic resources. We hope that the development of online services such as MegaLTR, which can analyze large amounts of genomic data, will become increasingly important for the automated detection and annotation of LTR-RT elements.
Collapse
Affiliation(s)
- Morad M. Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Benguerir, Morocco
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Benguerir, Morocco
| |
Collapse
|
3
|
Mokhtar MM, Alsamman AM, El Allali A. PlantLTRdb: An interactive database for 195 plant species LTR-retrotransposons. FRONTIERS IN PLANT SCIENCE 2023; 14:1134627. [PMID: 36950350 PMCID: PMC10025401 DOI: 10.3389/fpls.2023.1134627] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/16/2023] [Indexed: 05/29/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.
Collapse
|
4
|
Kirov I, Merkulov P, Polkhovskaya E, Konstantinov Z, Kazancev M, Saenko K, Polkhovskiy A, Dudnikov M, Garibyan T, Demurin Y, Soloviev A. Epigenetic Stress and Long-Read cDNA Sequencing of Sunflower ( Helianthus annuus L.) Revealed the Origin of the Plant Retrotranscriptome. PLANTS (BASEL, SWITZERLAND) 2022; 11:3579. [PMID: 36559691 PMCID: PMC9784723 DOI: 10.3390/plants11243579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 06/12/2023]
Abstract
Transposable elements (TEs) contribute not only to genome diversity but also to transcriptome diversity in plants. To unravel the sources of LTR retrotransposon (RTE) transcripts in sunflower, we exploited a recently developed transposon activation method ('TEgenesis') along with long-read cDNA Nanopore sequencing. This approach allows for the identification of 56 RTE transcripts from different genomic loci including full-length and non-autonomous RTEs. Using the mobilome analysis, we provided a new set of expressed and transpositional active sunflower RTEs for future studies. Among them, a Ty3/Gypsy RTE called SUNTY3 exhibited ongoing transposition activity, as detected by eccDNA analysis. We showed that the sunflower genome contains a diverse set of non-autonomous RTEs encoding a single RTE protein, including the previously described TR-GAG (terminal repeat with the GAG domain) as well as new categories, TR-RT-RH, TR-RH, and TR-INT-RT. Our results demonstrate that 40% of the loci for RTE-related transcripts (nonLTR-RTEs) lack their LTR sequences and resemble conventional eucaryotic genes encoding RTE-related proteins with unknown functions. It was evident based on phylogenetic analysis that three nonLTR-RTEs encode GAG (HadGAG1-3) fused to a host protein. These HadGAG proteins have homologs found in other plant species, potentially indicating GAG domestication. Ultimately, we found that the sunflower retrotranscriptome originated from the transcription of active RTEs, non-autonomous RTEs, and gene-like RTE transcripts, including those encoding domesticated proteins.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Ekaterina Polkhovskaya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Zakhar Konstantinov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Mikhail Kazancev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Ksenia Saenko
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Federal Research Center of Biological Plant Protection, 350039 Krasnodar, Russia
| | - Alexander Polkhovskiy
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Tsovinar Garibyan
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Yakov Demurin
- Pustovoit All-Russia Research Institute of Oilseed Crops, Filatova St. 17, 350038 Krasnodar, Russia
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| |
Collapse
|
5
|
Kirov I, Merkulov P, Dudnikov M, Polkhovskaya E, Komakhin RA, Konstantinov Z, Gvaramiya S, Ermolaev A, Kudryavtseva N, Gilyok M, Divashuk MG, Karlov GI, Soloviev A. Transposons Hidden in Arabidopsis thaliana Genome Assembly Gaps and Mobilization of Non-Autonomous LTR Retrotransposons Unravelled by Nanotei Pipeline. PLANTS (BASEL, SWITZERLAND) 2021; 10:2681. [PMID: 34961152 PMCID: PMC8704663 DOI: 10.3390/plants10122681] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 06/12/2023]
Abstract
Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Ekaterina Polkhovskaya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Roman A. Komakhin
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Zakhar Konstantinov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Sofya Gvaramiya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Aleksey Ermolaev
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Natalya Kudryavtseva
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Marina Gilyok
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Mikhail G. Divashuk
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Gennady I. Karlov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| |
Collapse
|
6
|
Orozco-Arias S, Candamil-Cortés MS, Jaimes PA, Piña JS, Tabares-Soto R, Guyot R, Isaza G. K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes. PeerJ 2021; 9:e11456. [PMID: 34055489 PMCID: PMC8140598 DOI: 10.7717/peerj.11456] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 04/24/2021] [Indexed: 12/15/2022] Open
Abstract
Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | | | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Johan S Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| |
Collapse
|
7
|
Orozco-Arias S, Jaimes PA, Candamil MS, Jiménez-Varón CF, Tabares-Soto R, Isaza G, Guyot R. InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning. Genes (Basel) 2021; 12:genes12020190. [PMID: 33525408 PMCID: PMC7910972 DOI: 10.3390/genes12020190] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/04/2022] Open
Abstract
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
- Correspondence: (S.O.-A.); (R.G.)
| | - Paula A. Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | - Mariana S. Candamil
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, 34394 Montpellier, France
- Correspondence: (S.O.-A.); (R.G.)
| |
Collapse
|
8
|
Maiwald S, Weber B, Seibt KM, Schmidt T, Heitkam T. The Cassandra retrotransposon landscape in sugar beet (Beta vulgaris) and related Amaranthaceae: recombination and re-shuffling lead to a high structural variability. ANNALS OF BOTANY 2021; 127:91-109. [PMID: 33009553 PMCID: PMC7750724 DOI: 10.1093/aob/mcaa176] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 09/28/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND AND AIMS Plant genomes contain many retrotransposons and their derivatives, which are subject to rapid sequence turnover. As non-autonomous retrotransposons do not encode any proteins, they experience reduced selective constraints leading to their diversification into multiple families, usually limited to a few closely related species. In contrast, the non-coding Cassandra terminal repeat retrotransposons in miniature (TRIMs) are widespread in many plants. Their hallmark is a conserved 5S rDNA-derived promoter in their long terminal repeats (LTRs). As sugar beet (Beta vulgaris) has a well-described LTR retrotransposon landscape, we aim to characterize TRIMs in beet and related genomes. METHODS We identified Cassandra retrotransposons in the sugar beet reference genome and characterized their structural relationships. Genomic organization, chromosomal localization, and distribution of Cassandra-TRIMs across the Amaranthaceae were verified by Southern and fluorescent in situ hybridization. KEY RESULTS All 638 Cassandra sequences in the sugar beet genome contain conserved LTRs and thus constitute a single family. Nevertheless, variable internal regions required a subdivision into two Cassandra subfamilies within B. vulgaris. The related Chenopodium quinoa harbours a third subfamily. These subfamilies vary in their distribution within Amaranthaceae genomes, their insertion times and the degree of silencing by small RNAs. Cassandra retrotransposons gave rise to many structural variants, such as solo LTRs or tandemly arranged Cassandra retrotransposons. These Cassandra derivatives point to an interplay of template switch and recombination processes - mechanisms that likely caused Cassandra's subfamily formation and diversification. CONCLUSIONS We traced the evolution of Cassandra in the Amaranthaceae and detected a considerable variability within the short internal regions, whereas the LTRs are strongly conserved in sequence and length. Presumably these hallmarks make Cassandra a prime target for unequal recombination, resulting in the observed structural diversity, an example of the impact of LTR-mediated evolutionary mechanisms on the host genome.
Collapse
Affiliation(s)
- Sophie Maiwald
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Beatrice Weber
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Kathrin M Seibt
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Thomas Schmidt
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Tony Heitkam
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
9
|
Kirov I, Dudnikov M, Merkulov P, Shingaliev A, Omarov M, Kolganova E, Sigaeva A, Karlov G, Soloviev A. Nanopore RNA Sequencing Revealed Long Non-Coding and LTR Retrotransposon-Related RNAs Expressed at Early Stages of Triticale SEED Development. PLANTS 2020; 9:plants9121794. [PMID: 33348863 PMCID: PMC7765848 DOI: 10.3390/plants9121794] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/10/2020] [Accepted: 12/15/2020] [Indexed: 01/22/2023]
Abstract
The intergenic space of plant genomes encodes many functionally important yet unexplored RNAs. The genomic loci encoding these RNAs are often considered “junk”, DNA as they are frequently associated with repeat-rich regions of the genome. The latter makes the annotations of these loci and the assembly of the corresponding transcripts using short RNAseq reads particularly challenging. Here, using long-read Nanopore direct RNA sequencing, we aimed to identify these “junk” RNA molecules, including long non-coding RNAs (lncRNAs) and transposon-derived transcripts expressed during early stages (10 days post anthesis) of seed development of triticale (AABBRR, 2n = 6x = 42), an interspecific hybrid between wheat and rye. Altogether, we found 796 lncRNAs and 20 LTR retrotransposon-related transcripts (RTE-RNAs) expressed at this stage, with most of them being previously unannotated and located in the intergenic as well as intronic regions. Sequence analysis of the lncRNAs provide evidence for the frequent exonization of Class I (retrotransposons) and class II (DNA transposons) transposon sequences and suggest direct influence of “junk” DNA on the structure and origin of lncRNAs. We show that the expression patterns of lncRNAs and RTE-related transcripts have high stage specificity. In turn, almost half of the lncRNAs located in Genomes A and B have the highest expression levels at 10–30 days post anthesis in wheat. Detailed analysis of the protein-coding potential of the RTE-RNAs showed that 75% of them carry open reading frames (ORFs) for a diverse set of GAG proteins, the main component of virus-like particles of LTR retrotransposons. We further experimentally demonstrated that some RTE-RNAs originate from autonomous LTR retrotransposons with ongoing transposition activity during early stages of triticale seed development. Overall, our results provide a framework for further exploration of the newly discovered lncRNAs and RTE-RNAs in functional and genome-wide association studies in triticale and wheat. Our study also demonstrates that Nanopore direct RNA sequencing is an indispensable tool for the elucidation of lncRNA and retrotransposon transcripts.
Collapse
Affiliation(s)
- Ilya Kirov
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Street, 42, 127550 Moscow, Russia
- Correspondence:
| | - Maxim Dudnikov
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Street, 42, 127550 Moscow, Russia
| | - Pavel Merkulov
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| | - Andrey Shingaliev
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| | - Murad Omarov
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
- Faculty of Computer Science, National Research University Higher School of Economics, Pokrovsky Boulvar, 11, 109028 Moscow, Russia
| | - Elizaveta Kolganova
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| | - Alexandra Sigaeva
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| | - Gennady Karlov
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| | - Alexander Soloviev
- Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia; (M.D.); (P.M.); (A.S.); (M.O.); (E.K.); (A.S.); (G.K.); (A.S.)
| |
Collapse
|
10
|
Kirov I, Omarov M, Merkulov P, Dudnikov M, Gvaramiya S, Kolganova E, Komakhin R, Karlov G, Soloviev A. Genomic and Transcriptomic Survey Provides New Insight into the Organization and Transposition Activity of Highly Expressed LTR Retrotransposons of Sunflower ( Helianthus annuus L.). Int J Mol Sci 2020; 21:E9331. [PMID: 33297579 PMCID: PMC7730604 DOI: 10.3390/ijms21239331] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 12/01/2020] [Accepted: 12/04/2020] [Indexed: 12/21/2022] Open
Abstract
LTR retrotransposons (RTEs) play a crucial role in plant genome evolution and adaptation. Although RTEs are generally silenced in somatic plant tissues under non-stressed conditions, some expressed RTEs (exRTEs) escape genome defense mechanisms. As our understanding of exRTE organization in plants is rudimentary, we systematically surveyed the genomic and transcriptomic organization and mobilome (transposition) activity of sunflower (Helianthus annuus L.) exRTEs. We identified 44 transcribed RTEs in the sunflower genome and demonstrated their distinct genomic features: more recent insertion time, longer open reading frame (ORF) length, and smaller distance to neighboring genes. We showed that GAG-encoding ORFs are present at significantly higher frequencies in exRTEs, compared with non-expressed RTEs. Most exRTEs exhibit variation in copy number among sunflower cultivars and one exRTE Gagarin produces extrachromosomal circular DNA in seedling, demonstrating recent and ongoing transposition activity. Nanopore direct RNA sequencing of full-length RTE RNA revealed complex patterns of alternative splicing in RTE RNAs, resulting in isoforms that carry ORFs for distinct RTE proteins. Together, our study demonstrates that tens of expressed sunflower RTEs with specific genomic organization shape the hidden layer of the transcriptome, pointing to the evolution of specific strategies that circumvent existing genome defense mechanisms.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Street, 42, 127550 Moscow, Russia
| | - Murad Omarov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
- Faculty of Computer Science, National Research University Higher School of Economics, Pokrovsky Boulvar 11, 109028 Moscow, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Street, 42, 127550 Moscow, Russia
| | - Sofya Gvaramiya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| | - Elizaveta Kolganova
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| | - Roman Komakhin
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| | - Gennady Karlov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (M.O.); (P.M.); (M.D.); (S.G.); (E.K.); (R.K.); (G.K.); (A.S.)
| |
Collapse
|
11
|
Orozco-Arias S, Tobon-Orozco N, Piña JS, Jiménez-Varón CF, Tabares-Soto R, Guyot R. TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets. BIOLOGY 2020; 9:E281. [PMID: 32917036 PMCID: PMC7563458 DOI: 10.3390/biology9090281] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 09/01/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era. TIP_finder version 1.0 is available at https://github.com/simonorozcoarias/TIP_finder.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170002, Colombia
| | - Nicolas Tobon-Orozco
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | - Johan S. Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
- Institut de Recherche pour le Développement (IRD), CIRAD, Université de Montpellier, 34394 Montpellier, France
| |
Collapse
|
12
|
Abd El-Wahab MMH, Aljabri M, Sarhan MS, Osman G, Wang S, Mabrouk M, El-Shabrawi HM, Gabr AMM, Abd El-Haliem AM, O’Sullivan DM, El-Soda M. High-Density SNP-Based Association Mapping of Seed Traits in Fenugreek Reveals Homology with Clover. Genes (Basel) 2020; 11:E893. [PMID: 32764325 PMCID: PMC7464718 DOI: 10.3390/genes11080893] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 07/28/2020] [Accepted: 08/02/2020] [Indexed: 12/02/2022] Open
Abstract
Fenugreek as a self-pollinated plant is ideal for genome-wide association mapping where traits can be marked by their association with natural mutations. However, fenugreek is poorly investigated at the genomic level due to the lack of information regarding its genome. To fill this gap, we genotyped a collection of 112 genotypes with 153,881 SNPs using double digest restriction site-associated DNA sequencing. We used 38,142 polymorphic SNPs to prove the suitability of the population for association mapping. One significant SNP was associated with both seed length and seed width, and another SNP was associated with seed color. Due to the lack of a comprehensive genetic map, it is neither possible to align the newly developed markers to chromosomes nor to predict the underlying genes. Therefore, systematic targeting of those markers to homologous genomes of other legumes can overcome those problems. A BLAST search using the genomic fenugreek sequence flanking the identified SNPs showed high homology with several members of the Trifolieae tribe indicating the potential of translational approaches to improving our understanding of the fenugreek genome. Using such a comprehensively-genotyped fenugreek population is the first step towards identifying genes underlying complex traits and to underpin fenugreek marker-assisted breeding programs.
Collapse
Affiliation(s)
- Mustafa M. H. Abd El-Wahab
- Department of Agronomy, Faculty of Agriculture, Cairo University, Giza 12613, Egypt; (M.M.H.A.E.-W.); (M.M.)
| | - Maha Aljabri
- Department of Biology, Faculty of Applied Sciences, Umm Al-Qura University, Makkah 21955, Saudi Arabia; (M.A.); (G.O.)
- Research Laboratories Centre, Faculty of Applied Science, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Mohamed S. Sarhan
- Environmental Studies and Research Unit, Cairo University, Giza 12613, Egypt;
| | - Gamal Osman
- Department of Biology, Faculty of Applied Sciences, Umm Al-Qura University, Makkah 21955, Saudi Arabia; (M.A.); (G.O.)
- Research Laboratories Centre, Faculty of Applied Science, Umm Al-Qura University, Makkah 21955, Saudi Arabia
- Agricultural Genetic Engineering Research Institute (AGERI), ARC, Giza 12915, Egypt
| | - Shichen Wang
- Genomics and Bioinformatics Service Texas A&M AgriLife Research, Amarillo College Station, Amarillo, TX 77845, USA;
| | - Mahmoud Mabrouk
- Department of Agronomy, Faculty of Agriculture, Cairo University, Giza 12613, Egypt; (M.M.H.A.E.-W.); (M.M.)
| | - Hattem M. El-Shabrawi
- Plant Biotechnology Department, National Research Center, Giza 12622, Egypt; (H.M.E.-S.); (A.M.M.G.)
| | - Ahmed M. M. Gabr
- Plant Biotechnology Department, National Research Center, Giza 12622, Egypt; (H.M.E.-S.); (A.M.M.G.)
| | - Ahmed M. Abd El-Haliem
- Plant Physiology, University of Amsterdam, Swammerdam Institute for Life Sciences Amsterdam, 1098 XH Amsterdam, The Netherlands;
| | - Donal M. O’Sullivan
- School of Agriculture, Policy and Development, University of Reading, Whiteknights, Reading RG6 6AR, UK;
| | - Mohamed El-Soda
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza 12613, Egypt
| |
Collapse
|
13
|
Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes (Basel) 2020. [DOI: 10.3390/pr8060638] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.
Collapse
|
14
|
Orozco-Arias S, Isaza G, Guyot R. Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning. Int J Mol Sci 2019; 20:E3837. [PMID: 31390781 PMCID: PMC6696364 DOI: 10.3390/ijms20153837] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/31/2019] [Accepted: 08/02/2019] [Indexed: 01/26/2023] Open
Abstract
Transposable elements (TEs) are genomic units able to move within the genome of virtually all organisms. Due to their natural repetitive numbers and their high structural diversity, the identification and classification of TEs remain a challenge in sequenced genomes. Although TEs were initially regarded as "junk DNA", it has been demonstrated that they play key roles in chromosome structures, gene expression, and regulation, as well as adaptation and evolution. A highly reliable annotation of these elements is, therefore, crucial to better understand genome functions and their evolution. To date, much bioinformatics software has been developed to address TE detection and classification processes, but many problematic aspects remain, such as the reliability, precision, and speed of the analyses. Machine learning and deep learning are algorithms that can make automatic predictions and decisions in a wide variety of scientific applications. They have been tested in bioinformatics and, more specifically for TEs, classification with encouraging results. In this review, we will discuss important aspects of TEs, such as their structure, importance in the evolution and architecture of the host, and their current classifications and nomenclatures. We will also address current methods and their limitations in identifying and classifying TEs.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170001, Colombia
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170001, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170001, Colombia
| | - Romain Guyot
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales 170001, Colombia.
- Institut de Recherche pour le Développement, CIRAD, University Montpellier, 34000 Montpellier, France.
| |
Collapse
|
15
|
Suguiyama VF, Vasconcelos LAB, Rossi MM, Biondo C, de Setta N. The population genetic structure approach adds new insights into the evolution of plant LTR retrotransposon lineages. PLoS One 2019; 14:e0214542. [PMID: 31107873 PMCID: PMC6527191 DOI: 10.1371/journal.pone.0214542] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 03/14/2019] [Indexed: 12/30/2022] Open
Abstract
Long terminal repeat retrotransposons (LTR-RTs) in plant genomes differ in abundance, structure and genomic distribution, reflecting the large number of evolutionary lineages. Elements within lineages can be considered populations, in which each element is an individual in its genomic environment. In this way, it would be reasonable to apply microevolutionary analyses to understand transposable element (TE) evolution, such as those used to study the genetic structure of natural populations. Here, we applied a Bayesian method to infer genetic structure of populations together with classical phylogenetic and dating tools to analyze LTR-RT evolution using the monocot Setaria italica as a model species. In contrast to a phylogeny, the Bayesian clusterization method identifies populations by assigning individuals to one or more clusters according to the most probabilistic scenario of admixture, based on genetic diversity patterns. In this work, each LTR-RT insertion was considered to be one individual and each LTR-RT lineage was considered to be a single species. Nine evolutionary lineages of LTR-RTs were identified in the S. italica genome that had different genetic structures with variable numbers of clusters and levels of admixture. Comprehensive analysis of the phylogenetic, clusterization and time of insertion data allowed us to hypothesize that admixed elements represent sequences that harbor ancestral polymorphic sequence signatures. In conclusion, application of microevolutionary concepts in genome evolution studies is suitable as a complementary approach to phylogenetic analyses to address the evolutionary history and functional features of TEs.
Collapse
Affiliation(s)
- Vanessa Fuentes Suguiyama
- Centro de Ciências Naturais e Humanas, Universidade Federal do ABC, São Bernardo do Campo, SP, Brazil
| | | | - Maria Magdalena Rossi
- Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Cibele Biondo
- Centro de Ciências Naturais e Humanas, Universidade Federal do ABC, São Bernardo do Campo, SP, Brazil
| | - Nathalia de Setta
- Centro de Ciências Naturais e Humanas, Universidade Federal do ABC, São Bernardo do Campo, SP, Brazil
- * E-mail:
| |
Collapse
|
16
|
Inpactor, Integrated and Parallel Analyzer and Classifier of LTR Retrotransposons and Its Application for Pineapple LTR Retrotransposons Diversity and Dynamics. BIOLOGY 2018; 7:biology7020032. [PMID: 29799487 PMCID: PMC6022998 DOI: 10.3390/biology7020032] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 05/16/2018] [Accepted: 05/22/2018] [Indexed: 12/22/2022]
Abstract
One particular class of Transposable Elements (TEs), called Long Terminal Repeats (LTRs), retrotransposons, comprises the most abundant mobile elements in plant genomes. Their copy number can vary from several hundreds to up to a few million copies per genome, deeply affecting genome organization and function. The detailed classification of LTR retrotransposons is an essential step to precisely understand their effect at the genome level, but remains challenging in large-sized genomes, requiring the use of optimized bioinformatics tools that can take advantage of supercomputers. Here, we propose a new tool: Inpactor, a parallel and scalable pipeline designed to classify LTR retrotransposons, to identify autonomous and non-autonomous elements, to perform RT-based phylogenetic trees and to analyze their insertion times using High Performance Computing (HPC) techniques. Inpactor was tested on the classification and annotation of LTR retrotransposons in pineapple, a recently-sequenced genome. The pineapple genome assembly comprises 44% of transposable elements, of which 23% were classified as LTR retrotransposons. Exceptionally, 16.4% of the pineapple genome assembly corresponded to only one lineage of the Gypsy superfamily: Del, suggesting that this particular lineage has undergone a significant increase in its copy numbers. As demonstrated for the pineapple genome, Inpactor provides comprehensive data of LTR retrotransposons’ classification and dynamics, allowing a fine understanding of their contribution to genome structure and evolution. Inpactor is available at https://github.com/simonorozcoarias/Inpactor.
Collapse
|
17
|
de Castro Nunes R, Orozco-Arias S, Crouzillat D, Mueller LA, Strickler SR, Descombes P, Fournier C, Moine D, de Kochko A, Yuyama PM, Vanzela ALL, Guyot R. Structure and Distribution of Centromeric Retrotransposons at Diploid and Allotetraploid Coffea Centromeric and Pericentromeric Regions. FRONTIERS IN PLANT SCIENCE 2018; 9:175. [PMID: 29497436 PMCID: PMC5818461 DOI: 10.3389/fpls.2018.00175] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/30/2018] [Indexed: 05/18/2023]
Abstract
Centromeric regions of plants are generally composed of large array of satellites from a specific lineage of Gypsy LTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genus Coffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploid C. arabica genome and its two diploid ancestors: Coffea canephora and C. eugenioides. Ten distinct CRC (Centromeric Retrotransposons in Coffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains in C. canephora, C. eugenioides, and C. arabica clearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions on C. arabica and C. canephora chromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization in Coffea, contrasting with other plant genomes.
Collapse
Affiliation(s)
- Renata de Castro Nunes
- Laboratory of Cytogenetics and Plant Diversity, Department of General Biology, Center for Biological Sciences, State University of Londrina, Londrina, Brazil
| | - Simon Orozco-Arias
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Colombia
| | | | - Lukas A. Mueller
- Boyce Thompson Institute, Cornell University, Ithaca, NY, United States
| | - Suzy R. Strickler
- Boyce Thompson Institute, Cornell University, Ithaca, NY, United States
| | | | | | - Deborah Moine
- Nestlé Institute of Health Sciences, Lausanne, Switzerland
| | - Alexandre de Kochko
- Institut de Recherche pour le Développement, UMR DIADE, EvoGec, Montpellier, France
| | - Priscila M. Yuyama
- Laboratory of Cytogenetics and Plant Diversity, Department of General Biology, Center for Biological Sciences, State University of Londrina, Londrina, Brazil
| | - André L. L. Vanzela
- Laboratory of Cytogenetics and Plant Diversity, Department of General Biology, Center for Biological Sciences, State University of Londrina, Londrina, Brazil
- *Correspondence: André L. L. Vanzela
| | - Romain Guyot
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Colombia
- Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, UMR IPME, Montpellier, France
- Romain Guyot
| |
Collapse
|
18
|
Guyot R, Darré T, Dupeyron M, de Kochko A, Hamon S, Couturon E, Crouzillat D, Rigoreau M, Rakotomalala JJ, Raharimalala NE, Akaffou SD, Hamon P. Partial sequencing reveals the transposable element composition of Coffea genomes and provides evidence for distinct evolutionary stories. Mol Genet Genomics 2016; 291:1979-90. [PMID: 27469896 DOI: 10.1007/s00438-016-1235-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/25/2016] [Indexed: 10/21/2022]
Abstract
The Coffea genus, 124 described species, has a natural distribution spreading from inter-tropical Africa, to Western Indian Ocean Islands, India, Asia and up to Australasia. Two cultivated species, C. arabica and C. canephora, are intensively studied while, the breeding potential and the genome composition of all the wild species remained poorly uncharacterized. Here, we report the characterization and comparison of the highly repeated transposable elements content of 11 Coffea species representatives of the natural biogeographic distribution. A total of 994 Mb from 454 reads were produced with a genome coverage ranging between 3.2 and 15.7 %. The analyses showed that highly repeated transposable elements, mainly LTR retrotransposons (LTR-RT), represent between 32 and 53 % of Coffea genomes depending on their biogeographic location and genome size. Species from West and Central Africa (Eucoffea) contained the highest LTR-RT content but with no strong variation relative to their genome size. At the opposite, for the insular species (Mascarocoffea), a strong variation of LTR-RT was observed suggesting differential dynamics of these elements in this group. Two LTR-RT lineages, SIRE and Del were clearly differentially accumulated between African and insular species, suggesting these lineages were associated to the genome divergence of Coffea species in Africa. Altogether, the information obtained in this study improves our knowledge and brings new data on the composition, the evolution and the divergence of wild Coffea genomes.
Collapse
Affiliation(s)
- Romain Guyot
- IRD UMR IPME, CoffeeAdapt, BP 64501, 34394, Montpellier Cedex 5, France.
| | - Thibaud Darré
- IRD UMR DIADE, EvoGeC, BP 64501, 34394, Montpellier Cedex 5, France
| | | | | | - Serge Hamon
- IRD UMR DIADE, EvoGeC, BP 64501, 34394, Montpellier Cedex 5, France
| | | | - Dominique Crouzillat
- Nestlé R&D Tours, 101 AV. G. Eiffel, Notre Dame d'Oe ́, BP 49716, 37097, Tours Cedex 2, France
| | - Michel Rigoreau
- Nestlé R&D Tours, 101 AV. G. Eiffel, Notre Dame d'Oe ́, BP 49716, 37097, Tours Cedex 2, France
| | | | | | | | - Perla Hamon
- IRD UMR DIADE, EvoGeC, BP 64501, 34394, Montpellier Cedex 5, France
| |
Collapse
|
19
|
Beulé T, Agbessi MD, Dussert S, Jaligot E, Guyot R. Genome-wide analysis of LTR-retrotransposons in oil palm. BMC Genomics 2015; 16:795. [PMID: 26470789 PMCID: PMC4608283 DOI: 10.1186/s12864-015-2023-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 10/07/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The oil palm (Elaeis guineensis Jacq.) is a major cultivated crop and the world's largest source of edible vegetable oil. The genus Elaeis comprises two species E. guineensis, the commercial African oil palm and E. oleifera, which is used in oil palm genetic breeding. The recent publication of both the African oil palm genome assembly and the first draft sequence of its Latin American relative now allows us to tackle the challenge of understanding the genome composition, structure and evolution of these palm genomes through the annotation of their repeated sequences. METHODS In this study, we identified, annotated and compared Transposable Elements (TE) from the African and Latin American oil palms. In a first step, Transposable Element databases were built through de novo detection in both genome sequences then the TE content of both genomes was estimated. Then putative full-length retrotransposons with Long Terminal Repeats (LTRs) were further identified in the E. guineensis genome for characterization of their structural diversity, copy number and chromosomal distribution. Finally, their relative expression in several tissues was determined through in silico analysis of publicly available transcriptome data. RESULTS Our results reveal a congruence in the transpositional history of LTR retrotransposons between E. oleifera and E. guineensis, especially the Sto-4 family. Also, we have identified and described 583 full-length LTR-retrotransposons in the Elaeis guineensis genome. Our work shows that these elements are most likely no longer mobile and that no recent insertion event has occurred. Moreover, the analysis of chromosomal distribution suggests a preferential insertion of Copia elements in gene-rich regions, whereas Gypsy elements appear to be evenly distributed throughout the genome. CONCLUSIONS Considering the high proportion of LTR retrotransposon in the oil palm genome, our work will contribute to a greater understanding of their impact on genome organization and evolution. Moreover, the knowledge gained from this study constitutes a valuable resource for both the improvement of genome annotation and the investigation of the evolutionary history of palms.
Collapse
Affiliation(s)
- Thierry Beulé
- CIRAD, UMR DIADE (IRD, UM), 34394, Montpellier, France.
| | | | | | | | - Romain Guyot
- IRD, UMR IPME (IRD, CIRAD, UM), 34394, Montpellier, France.
| |
Collapse
|