1
|
Marczuk-Rojas JP, Salmerón A, Alcayde A, Isanbaev V, Carretero-Paulet L. Plastid DNA is a major source of nuclear genome complexity and of RNA genes in the orphan crop moringa. BMC PLANT BIOLOGY 2024; 24:437. [PMID: 38773387 PMCID: PMC11110229 DOI: 10.1186/s12870-024-05158-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 05/16/2024] [Indexed: 05/23/2024]
Abstract
BACKGROUND Unlike Transposable Elements (TEs) and gene/genome duplication, the role of the so-called nuclear plastid DNA sequences (NUPTs) in shaping the evolution of genome architecture and function remains poorly studied. We investigate here the functional and evolutionary fate of NUPTs in the orphan crop Moringa oleifera (moringa), featured by the highest fraction of plastid DNA found so far in any plant genome, focusing on (i) any potential biases in their distribution in relation to specific nuclear genomic features, (ii) their contribution to the emergence of new genes and gene regions, and (iii) their impact on the expression of target nuclear genes. RESULTS In agreement with their potential mutagenic effect, NUPTs are underrepresented among structural genes, although their overall transcription levels and broadness were only lower when involved exonic regions; the occurrence of plastid DNA generally did not result in a broader expression, except among those affected in introns by older NUPTs. In contrast, we found a strong enrichment of NUPTs among specific superfamilies of retrotransposons and several classes of RNA genes, including those participating in the protein biosynthetic machinery (i.e., rRNA and tRNA genes) and a specific class of regulatory RNAs. A significant fraction of NUPT RNA genes was found to be functionally expressed, thus potentially contributing to the nuclear pool. CONCLUSIONS Our results complete our view of the molecular factors driving the evolution of nuclear genome architecture and function, and support plastid DNA in moringa as a major source of (i) genome complexity and (ii) the nuclear pool of RNA genes.
Collapse
Affiliation(s)
- Juan Pablo Marczuk-Rojas
- Department of Biology and Geology, University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain
- "Pabellón de Historia Natural-Centro de Investigación de Colecciones Científicas de la Universidad de Almería" (PHN-CECOUAL), University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain
| | - Antonio Salmerón
- Department of Mathematics and Center for the Development and Transfer of Mathematical Research to Industry (CDTIME), University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain
| | - Alfredo Alcayde
- Department of Engineering, University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain
| | - Viktor Isanbaev
- Department of Engineering, University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain
| | - Lorenzo Carretero-Paulet
- Department of Biology and Geology, University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain.
- "Pabellón de Historia Natural-Centro de Investigación de Colecciones Científicas de la Universidad de Almería" (PHN-CECOUAL), University of Almería, Ctra. Sacramento s/n, Almería, 04120, Spain.
| |
Collapse
|
2
|
Delorean EE, Youngblood RC, Simpson SA, Schoonmaker AN, Scheffler BE, Rutter WB, Hulse-Kemp AM. Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning. FRONTIERS IN PLANT SCIENCE 2023; 14:1184112. [PMID: 38034563 PMCID: PMC10687446 DOI: 10.3389/fpls.2023.1184112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023]
Abstract
As sequencing costs decrease and availability of high fidelity long-read sequencing increases, generating experiment specific de novo genome assemblies becomes feasible. In many crop species, obtaining the genome of a hybrid or heterozygous individual is necessary for systems that do not tolerate inbreeding or for investigating important biological questions, such as hybrid vigor. However, most genome assembly methods that have been used in plants result in a merged single sequence representation that is not a true biologically accurate representation of either haplotype within a diploid individual. The resulting genome assembly is often fragmented and exhibits a mosaic of the two haplotypes, referred to as haplotype-switching. Important haplotype level information, such as causal mutations and structural variation is therefore lost causing difficulties in interpreting downstream analyses. To overcome this challenge, we have applied a method developed for animal genome assembly called trio-binning to an intra-specific hybrid of chili pepper (Capsicum annuum L. cv. HDA149 x Capsicum annuum L. cv. HDA330). We tested all currently available softwares for performing trio-binning, combined with multiple scaffolding technologies including Bionano to determine the optimal method of producing the best haplotype-resolved assembly. Ultimately, we produced highly contiguous biologically true haplotype-resolved genome assemblies for each parent, with scaffold N50s of 266.0 Mb and 281.3 Mb, with 99.6% and 99.8% positioned into chromosomes respectively. The assemblies captured 3.10 Gb and 3.12 Gb of the estimated 3.5 Gb chili pepper genome size. These assemblies represent the complete genome structure of the intraspecific hybrid, as well as the two parental genomes, and show measurable improvements over the currently available reference genomes. Our manuscript provides a valuable guide on how to apply trio-binning to other plant genomes.
Collapse
Affiliation(s)
- Emily E. Delorean
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Ramey C. Youngblood
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, United States
| | - Sheron A. Simpson
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - Ashley N. Schoonmaker
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Brian E. Scheffler
- Genomics and Bioinformatics Research Unit, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Stoneville, MS, United States
| | - William B. Rutter
- US Vegetable Laboratory, United States Department of Agriculture - Agriculture Research Service (USDA-ARS), Charleston, SC, United States
| | - Amanda M. Hulse-Kemp
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, United States
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
3
|
Mokhtar MM, El Allali A. MegaLTR: a web server and standalone pipeline for detecting and annotating LTR-retrotransposons in plant genomes. FRONTIERS IN PLANT SCIENCE 2023; 14:1237426. [PMID: 37810401 PMCID: PMC10552921 DOI: 10.3389/fpls.2023.1237426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/21/2023] [Indexed: 10/10/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a class of RNA-replicating transposon elements (TEs) that can alter genome structure and function by moving positions, repositioning genes, shifting exons, and causing chromosomal rearrangements. LTR-RTs are widespread in many plant genomes and constitute a significant portion of the genome. Their movement and activity in eukaryotic genomes can provide insight into genome evolution and gene function, especially when LTR-RTs are located near or within genes. Building the redundant and non-redundant LTR-RTs libraries and their annotations for species lacking this resource requires extensive bioinformatics pipelines and expensive computing power to analyze large amounts of genomic data. This increases the need for online services that provide computational resources with minimal overhead and maximum efficiency. Here, we present MegaLTR as a web server and standalone pipeline that detects intact LTR-RTs at the whole-genome level and integrates multiple tools for structure-based, homologybased, and de novo identification, classification, annotation, insertion time determination, and LTR-RT gene chimera analysis. MegaLTR also provides statistical analysis and visualization with multiple tools and can be used to accelerate plant species discovery and assist breeding programs in their efforts to improve genomic resources. We hope that the development of online services such as MegaLTR, which can analyze large amounts of genomic data, will become increasingly important for the automated detection and annotation of LTR-RT elements.
Collapse
Affiliation(s)
- Morad M. Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Benguerir, Morocco
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Benguerir, Morocco
| |
Collapse
|
4
|
Mokhtar MM, Abd-Elhalim HM, El Allali A. A large-scale assessment of the quality of plant genome assemblies using the LTR assembly index. AOB PLANTS 2023; 15:plad015. [PMID: 37197714 PMCID: PMC10184434 DOI: 10.1093/aobpla/plad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/01/2023] [Indexed: 05/19/2023]
Abstract
Recent advances in genome sequencing have led to an increase in the number of sequenced genomes. However, the presence of repetitive sequences complicates the assembly of plant genomes. The LTR assembly index (LAI) has recently been widely used to assess the quality of genome assembly, as a higher LAI is associated with a higher quality of assembly. Here, we assessed the quality of assembled genomes of 1664 plant and algal genomes using LAI and reported the results as data repository called PlantLAI (https://bioinformatics.um6p.ma/PlantLAI). A number of 55 117 586 pseudomolecules/scaffolds with a total length of 988.11 gigabase-pairs were examined using the LAI workflow. A total of 46 583 551 accurate LTR-RTs were discovered, including 2 263 188 Copia, 2 933 052 Gypsy, and 1 387 311 unknown superfamilies. Consequently, only 1136 plant genomes are suitable for LAI calculation, with values ranging from 0 to 31.59. Based on the quality classification system, 476 diploid genomes were classified as draft, 472 as reference, and 135 as gold genomes. We also provide a free webtool to calculate the LAI of newly assembled genomes and the ability to save the result in the repository. The data repository is designed to fill in the gaps in the reported LAI of existing genomes, while the webtool is designed to help researchers calculate the LAI of their newly sequenced genomes.
Collapse
Affiliation(s)
| | - Haytham M Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt
| | | |
Collapse
|
5
|
Mokhtar MM, Alsamman AM, El Allali A. PlantLTRdb: An interactive database for 195 plant species LTR-retrotransposons. FRONTIERS IN PLANT SCIENCE 2023; 14:1134627. [PMID: 36950350 PMCID: PMC10025401 DOI: 10.3389/fpls.2023.1134627] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/16/2023] [Indexed: 05/29/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.
Collapse
|
6
|
Hassan AH, Mokhtar MM, El Allali A. TEMM: A Curated Data Resource for Transposon Element-Based Molecular Markers in Plants. Methods Mol Biol 2023; 2703:45-57. [PMID: 37646936 DOI: 10.1007/978-1-0716-3389-2_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Transposon elements (TEs) are mobile genetic elements that can insert themselves into new locations and modify the plant genome. In recent years, they have been used as molecular markers in plant breeding programs. TE-based molecular markers (TE-markers) are divided into two categories depending on the transcription mechanism of the TEs. The first category is retrotransposon-based molecular markers, which include RBIP, IRAP, REMAP, and iPBS. The second group is DNA-based-TE-markers, which include MITE, TE-junction, and CACTA TE-markers. These markers are a good tool for studying genetic diversity and can provide information on plants' phylogenetic and evolutionary history. They can help improve breeding programs to increase agronomic traits and develop new varieties. Overall, TE-markers play an important role in plant genetics and plant breeding and contribute to a better understanding of plant biology. Here, we present TEMM, a curated data resource for TE-markers in plants. Relevant research articles were screened to collect primer sequences and related information. Only articles containing primer sequences are added to the present data resource. TEMM contains 784 primers with their associated PCR reaction programs and their applications in various crops. These include 203 IPBS, 191 RBIP, 140 IRAP, 78 TE-junction, 76 IRAPS, 47 RBIP-IRAP, 16 IRAP-REMAP, 12 REMAP, 12 REMA-IRAP, 6 REMA, and 3 ISBP primers. The data resource is freely available at https://bioinformatics.um6p.ma/TEMM .
Collapse
Affiliation(s)
- Asmaa H Hassan
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Morad M Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco.
| |
Collapse
|
7
|
Mokhtar MM, Fouad AS, Abd-Elhalim HM, El Allali A. CicerSpTEdb2.0: An Upgrade of Cicer Species Transposable Elements Database. Methods Mol Biol 2023; 2703:71-82. [PMID: 37646938 DOI: 10.1007/978-1-0716-3389-2_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
To meet the critical demand of LTR-RTs data-driven research, we updated the CicerSpTEdb database to version 2.0, which includes more accurate intact LTR-RT elements with annotation of internal domains. We also added the ability to BLAST against TEs of Cicer species. As a result, 3701 intact LTR-RTs were detected in the studied genomes, including 2840 Copia and 861 Gypsy elements. Of the 3701 intact LTR-RTs, 588 were in C. arietinum, including 475 Copia and 113 Gypsy. While 1373 were detected in C. reticulatum, including 1041 Copia and 332 Gypsy. Furthermore, 1740 were found in C. echinospermum, including 1324 Copia and 416 Gypsy. Based on LTR-RT clades, the analysis classified the 3701 identified intact LTR-RTs in the studied genomes as Ale (850), SIRE (740), unknown (455), Ikeros (323), Reina (290), Tork (290), Ivana (282), Tekay (197), Athila (128), TAR (99), CRM (31), and Ogre (16) elements. The newly updated CicerSpTEdb2.0 will be a valuable resource for TEs of Cicer species and their comparative genomics.Database URL: http://cicersptedb.easyomics.org/index.php.
Collapse
Affiliation(s)
- Morad M Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Ahmed S Fouad
- Botany and Microbiology Department, Faculty of Science, Cairo University, Giza, Egypt
| | - Haytham M Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza, Egypt
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco.
| |
Collapse
|
8
|
Cognat V, Pawlak G, Pflieger D, Drouard L. PlantRNA 2.0: an updated database dedicated to tRNAs of photosynthetic eukaryotes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 112:1112-1119. [PMID: 36196656 DOI: 10.1111/tpj.15997] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
PlantRNA (http://plantrna.ibmp.cnrs.fr/) is a comprehensive database of transfer RNA (tRNA) gene sequences retrieved from fully annotated nuclear, plastidial and mitochondrial genomes of photosynthetic organisms. In the first release (PlantRNA 1.0), tRNA genes from 11 organisms were annotated. In this second version, the annotation was implemented to 51 photosynthetic species covering the whole phylogenetic tree of photosynthetic organisms, from the most basal group of Archeplastida, the glaucophyte Cyanophora paradoxa, to various land plants. tRNA genes from lower photosynthetic organisms such as streptophyte algae or lycophytes as well as extremophile photosynthetic species such as Eutrema parvulum were incorporated in the database. As a whole, about 37 000 tRNA genes were accurately annotated. In the frame of the tRNA genes annotation from the genome of the Rhodophyte Chondrus crispus, non-canonical splicing sites in the D- or T-regions of tRNA molecules were identified and experimentally validated. As for PlantRNA 1.0, comprehensive biological information including 5'- and 3'-flanking sequences, A and B box sequences, region of transcription initiation and poly(T) transcription termination stretches, tRNA intron sequences and tRNA mitochondrial import are included.
Collapse
Affiliation(s)
- Valérie Cognat
- Institut de biologie moléculaire des plantes-CNRS, Université de Strasbourg, 12 rue du Général Zimmer, F-67084, Strasbourg, France
| | - Gael Pawlak
- Institut de biologie moléculaire des plantes-CNRS, Université de Strasbourg, 12 rue du Général Zimmer, F-67084, Strasbourg, France
| | - David Pflieger
- Institut de biologie moléculaire des plantes-CNRS, Université de Strasbourg, 12 rue du Général Zimmer, F-67084, Strasbourg, France
| | - Laurence Drouard
- Institut de biologie moléculaire des plantes-CNRS, Université de Strasbourg, 12 rue du Général Zimmer, F-67084, Strasbourg, France
| |
Collapse
|
9
|
Matson MEH, Liang Q, Lonardi S, Judelson HS. Karyotype variation, spontaneous genome rearrangements affecting chemical insensitivity, and expression level polymorphisms in the plant pathogen Phytophthora infestans revealed using its first chromosome-scale assembly. PLoS Pathog 2022; 18:e1010869. [PMID: 36215336 PMCID: PMC9584435 DOI: 10.1371/journal.ppat.1010869] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/20/2022] [Accepted: 09/09/2022] [Indexed: 11/18/2022] Open
Abstract
Natural isolates of the potato and tomato pathogen Phytophthora infestans exhibit substantial variation in virulence, chemical sensitivity, ploidy, and other traits. A chromosome-scale assembly was developed to expand genomic resources for this oomyceteous microbe, and used to explore the basis of variation. Using PacBio and Illumina data, a long-range linking library, and an optical map, an assembly was created and coalesced into 15 pseudochromosomes spanning 219 Mb using SNP-based genetic linkage data. De novo gene prediction combined with transcript evidence identified 19,981 protein-coding genes, plus about eight thousand tRNA genes. The chromosomes were comprised of a mosaic of gene-rich and gene-sparse regions plus very long centromeres. Genes exhibited a biased distribution across chromosomes, especially members of families encoding RXLR and CRN effectors which clustered on certain chromosomes. Strikingly, half of F1 progeny of diploid parents were polyploid or aneuploid. Substantial expression level polymorphisms between strains were identified, much of which could be attributed to differences in chromosome dosage, transposable element insertions, and adjacency to repetitive DNA. QTL analysis identified a locus on the right arm of chromosome 3 governing sensitivity to the crop protection chemical metalaxyl. Strains heterozygous for resistance often experienced megabase-sized deletions of that part of the chromosome when cultured on metalaxyl, increasing resistance due to loss of the sensitive allele. This study sheds light on diverse phenomena affecting variation in P. infestans and relatives, helps explain the prevalence of polyploidy in natural populations, and provides a new foundation for biologic and genetic investigations.
Collapse
Affiliation(s)
- Michael E. H. Matson
- Department of Microbiology and Plant Pathology, University of California, Riverside, California, United States of America
| | - Qihua Liang
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Stefano Lonardi
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Howard S. Judelson
- Department of Microbiology and Plant Pathology, University of California, Riverside, California, United States of America
- * E-mail:
| |
Collapse
|
10
|
DeepPlnc: Bi-modal deep learning for highly accurate plant lncRNA discovery. Genomics 2022; 114:110443. [PMID: 35931273 DOI: 10.1016/j.ygeno.2022.110443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 06/27/2022] [Accepted: 07/29/2022] [Indexed: 11/24/2022]
Abstract
We present here a bi-modal CNN based deep-learning system, DeepPlnc, to identify plant lncRNAs with high accuracy while using sequence and structural properties. Unlike most of the existing software, it works accurately even in conditions with ambiguity of boundaries and incomplete sequences. It scored consistently high for performance metrics while breaching accuracy of >98% when tested across a large number of validated instances. During multiple benchmarkings it consistently outperformed all the compared tools and maintained a highly significant lead in the range of 2.5%- 4.6% from the second best performing tool (p-value << 0.01). DeepPlnc was used to annotate a de novo assembled transcriptome of a himalayan species where again it suggested its much better suitability for genome and transcriptome annotation purposes than the existing tools. DeepPlnc has been made freely available as a web-server and stand-alone program at https://scbb.ihbt.res.in/DeepPlnc/.
Collapse
|