51
|
Mokhtar MM, Alsamman AM, Abd-Elhalim HM, El Allali A. CicerSpTEdb: A web-based database for high-resolution genome-wide identification of transposable elements in Cicer species. PLoS One 2021; 16:e0259540. [PMID: 34762703 PMCID: PMC8584679 DOI: 10.1371/journal.pone.0259540] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 10/20/2021] [Indexed: 11/19/2022] Open
Abstract
Recently, Cicer species have experienced increased research interest due to their economic importance, especially in genetics, genomics, and crop improvement. The Cicer arietinum, Cicer reticulatum, and Cicer echinospermum genomes have been sequenced and provide valuable resources for trait improvement. Since the publication of the chickpea draft genome, progress has been made in genome assembly, functional annotation, and identification of polymorphic markers. However, work is still needed to identify transposable elements (TEs) and make them available for researchers. In this paper, we present CicerSpTEdb, a comprehensive TE database for Cicer species that aims to improve our understanding of the organization and structural variations of the chickpea genome. Using structure and homology-based methods, 3942 C. echinospermum, 3579 C. reticulatum, and 2240 C. arietinum TEs were identified. Comparisons between Cicer species indicate that C. echinospermum has the highest number of LTR-RT and hAT TEs. C. reticulatum has more Mutator, PIF Harbinger, Tc1 Mariner, and CACTA TEs, while C. arietinum has the highest number of Helitron. CicerSpTEdb enables users to search and visualize TEs by location and download their results. The database will provide a powerful resource that can assist in developing TE target markers for molecular breeding and answer related biological questions. Database URL: http://cicersptedb.easyomics.org/index.php.
Collapse
Affiliation(s)
- Morad M. Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- * E-mail: (AEA); (MMM)
| | | | - Haytham M. Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza, Egypt
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- * E-mail: (AEA); (MMM)
| |
Collapse
|
52
|
Wittmeyer KT, Oppenheim SJ, Hopper KR. Assemblies of the genomes of parasitic wasps using meta-assembly and scaffolding with genetic linkage. G3 (BETHESDA, MD.) 2021; 12:6423991. [PMID: 34751385 PMCID: PMC8727961 DOI: 10.1093/g3journal/jkab386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023]
Abstract
Safe, effective biological-control introductions against invasive pests depend on narrowly host-specific natural enemies with the ability to adapt to a changing environment. As part of a project on the genetic architectures of these traits, we assembled and annotated the genomes of two aphid parasitoids, Aphelinus atriplicis and Aphelinus certus. We report here several assemblies of A. atriplicis made with Illumina and PacBio data, which we combined into a meta-assembly. We scaffolded the meta-assembly with markers from a genetic map of hybrids between A. atriplicis and A. certus. We used this genetic-linkage scaffolded (GLS) assembly of A. atriplicis to scaffold a de novo assembly of A. certus. The de novo assemblies of A. atriplicis differed in contiguity, and the meta-assembly of these assemblies was more contiguous than the best de novo assembly. Scaffolding with genetic-linkage data allowed chromosomal-level assembly of the A. atriplicis genome and scaffolding a de novo assembly of A. certus with this GLS assembly, greatly increased the contiguity of the A. certus assembly to the point where it was also at the chromosomal-level. However, completeness of the A. atriplicis assembly, as measured by percent complete, single-copy BUSCO hymenopteran genes, varied little among de novo assemblies and was not increased by meta-assembly or genetic scaffolding. Furthermore, the greater contiguity of the meta-assembly and GLS assembly had little or no effect on the numbers of genes identified, the proportions with homologs or functional annotations. Increased contiguity of the A. certus assembly provided modest improvement in assembly completeness, as measured by percent complete, single-copy BUSCO hymenopteran genes. The total genic sequence increased, and while the number of genes declined, gene length increased, which together suggest greater accuracy of gene models. More contiguous assemblies provide uses other than gene annotation, for example, identifying the genes associated with quantitative trait loci and understanding of chromosomal rearrangements associated with speciation.
Collapse
Affiliation(s)
- Kameron T Wittmeyer
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, USA
| | | | - Keith R Hopper
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, USA,Corresponding author: USDA-ARS, Beneficial Insect Introductions Research Unit, 501 South Chapel Street, Newark, DE 19713, USA.
| |
Collapse
|
53
|
Hill AM, Rybarski JR, Hu K, Finkelstein IJ, Wilke CO. Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets. JOURNAL OF OPEN SOURCE SOFTWARE 2021; 6:3678. [PMID: 35445164 PMCID: PMC9017871 DOI: 10.21105/joss.03678] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Gene clusters are sets of co-localized, often contiguous genes that together perform specific functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from vast amounts of available genomic data. Therefore, we developed Opfi: a modular pipeline for identification of arbitrary gene clusters in assembled genomic or metagenomic sequences. Opfi contains functions for annotation, de-deduplication, and visualization of putative gene clusters. It utilizes a customizable rule-based filtering approach for selection of candidate systems that adhere to user-defined criteria. Opfi is implemented in Python, and is available on the Python Package Index and on Bioconda (Grüning et al., 2018).
Collapse
Affiliation(s)
- Alexis M Hill
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - James R Rybarski
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Kuang Hu
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas 78712, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Ilya J Finkelstein
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, USA
- Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
54
|
Liao X, Li M, Hu K, Wu FX, Gao X, Wang J. A sensitive repeat identification framework based on short and long reads. Nucleic Acids Res 2021; 49:e100. [PMID: 34214175 PMCID: PMC8464074 DOI: 10.1093/nar/gkab563] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/08/2021] [Accepted: 06/18/2021] [Indexed: 12/11/2022] Open
Abstract
Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
Collapse
Affiliation(s)
- Xingyu Liao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Kang Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| |
Collapse
|
55
|
Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, Li X, Li Y, Liao Y, Gao Q, Tu B, Yuan H, Ma B, Wang Y, Qian Y, Fan S, Li W, Wang J, He M, Yin J, Li T, Jiang N, Chen X, Liang C, Li S. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 2021; 184:3542-3558.e16. [PMID: 34051138 DOI: 10.1016/j.cell.2021.04.046] [Citation(s) in RCA: 277] [Impact Index Per Article: 69.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 01/31/2021] [Accepted: 04/24/2021] [Indexed: 12/30/2022]
Abstract
Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.
Collapse
Affiliation(s)
- Peng Qin
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China.
| | - Hongwei Lu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Huilong Du
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University, Baoding, China
| | - Hao Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Weilan Chen
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Zhuo Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Qiang He
- School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University, Baoding, China
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Hongyu Zhang
- School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University, Baoding, China
| | - Xuanzhao Li
- School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University, Baoding, China
| | - Xiuxiu Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yan Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | - Qiang Gao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Bin Tu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hua Yuan
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Bingtian Ma
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Yuping Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Yangwen Qian
- Biogle Genome Editing Center, Changzhou, Jiangsu, China
| | - Shijun Fan
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Weitao Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Jing Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Min He
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Junjie Yin
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Ting Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI, USA
| | - Xuewei Chen
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Shigui Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, China.
| |
Collapse
|
56
|
Tiedeman Z, Signor S. The transposable elements of the Drosophila serrata reference panel. Genome Biol Evol 2021; 13:6265467. [PMID: 33950180 PMCID: PMC8434751 DOI: 10.1093/gbe/evab100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2021] [Indexed: 11/13/2022] Open
Abstract
Transposable elements (TEs) are an important component of the complex genomic ecosystem. Understanding the tempo and mode of TE proliferation, that is whether it is in maintained in transposition selection balance, or is induced periodically by environmental stress or other factors, is important for understanding the evolution of organismal genomes through time. Although TEs have been characterized in individuals or limited samples, a true understanding of the population genetics of TEs, and therefore the tempo and mode of transposition, is still lacking. Here, we characterize the TE landscape in an important model Drosophila, Drosophila serrata using the D. serrata reference panel, which is comprised of 102 sequenced inbred genotypes. We annotate the families of TEs in the D. serrata genome and investigate variation in TE copy number between genotypes. We find that many TEs have low copy number in the population, but this varies by family and includes a single TE making up to 50% of the genome content of TEs. We find that some TEs proliferate in particular genotypes compared with population levels. In addition, we characterize variation in each TE family allowing copy number to vary in each genotype and find that some TEs have diversified very little between individuals suggesting recent spread. TEs are important sources of spontaneous mutations in Drosophila, making up a large fraction of the total number of mutations in particular genotypes. Understanding the dynamics of TEs within populations will be an important step toward characterizing the origin of variation within and between species.
Collapse
Affiliation(s)
- Zachery Tiedeman
- Department of Biological Sciences, North Dakota State University, Fargo, North Dakota, U.S.A
| | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, North Dakota, U.S.A
| |
Collapse
|
57
|
Abstract
Transposable elements (TEs) are important contributors to genome structure and evolution. With the growth of sequencing technologies, various computational pipelines and software programs have been developed to facilitate TE identification and annotation. These computational tools can be categorized into three types based on their underlying approach: homology-based, structural-based, and de novo methods. Each of these tools has advantages and disadvantages. In this chapter, we introduce EDTA (Extensive de novo TE Annotator), a new comprehensive pipeline composed of high-quality tools to identify and annotate all types of TEs. The development of EDTA is based on the benchmarking results of a collection of TE annotation methods. The selected programs are evaluated by their ability to identify true TEs as well as to exclude false candidates. Here, we present an overview of the EDTA pipeline and a detailed manual for its use. The source code of EDTA is available at https://github.com/oushujun/EDTA .
Collapse
Affiliation(s)
- Weijia Su
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Thomas Peterson
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, USA.
- Department of Agronomy, Iowa State University, Ames, IA, USA.
| |
Collapse
|
58
|
Feng C, Dai M, Liu Y, Chen M. Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage. Brief Bioinform 2020; 22:5855256. [PMID: 32591772 DOI: 10.1093/bib/bbaa086] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 04/10/2020] [Accepted: 04/22/2020] [Indexed: 11/12/2022] Open
Abstract
DNA repeats are abundant in eukaryotic genomes and have been proved to play a vital role in genome evolution and regulation. A large number of approaches have been proposed to identify various repeats in the genome. Some de novo repeat identification tools can efficiently generate sequence repetitive scores based on k-mer counting for repeat detection. However, we noticed that these tools can still be improved in terms of repetitive score calculation, sensitivity to segmental duplications and detection specificity. Therefore, here, we present a new computational approach named Repeat Locator (RepLoc), which is based on weighted k-mer coverage to quantify the genome sequence repetitiveness and locate the repetitive sequences. According to the repetitiveness map of the human genome generated by RepLoc, we found that there may be relationships between sequence repetitiveness and genome structures. A comprehensive benchmark shows that RepLoc is a more efficient k-mer counting based tool for de novo repeat detection. The RepLoc software is freely available at http://bis.zju.edu.cn/reploc.
Collapse
Affiliation(s)
- Cong Feng
- Ming Chen's laboratory in Zhejiang University
| | - Min Dai
- Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences
| | | | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University
| |
Collapse
|
59
|
Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 2019. [PMID: 31843001 DOI: 10.1101/657890v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Weija Su
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697, USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Jireh R A Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Adam J Hellinga
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | | | - Tyler A Elliott
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- USDA-ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY, 14853, USA
| | - Thomas Peterson
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN, 55108, USA.
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
60
|
Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 2019; 20:275. [PMID: 31843001 PMCID: PMC6913007 DOI: 10.1186/s13059-019-1905-y] [Citation(s) in RCA: 679] [Impact Index Per Article: 113.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 11/28/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Weija Su
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697 USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA
| | - Jireh R. A. Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | - Adam J. Hellinga
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | | | - Tyler A. Elliott
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA
- USDA-ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853 USA
| | - Thomas Peterson
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI 48824 USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN 55108 USA
| | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| |
Collapse
|
61
|
Martin SL, Parent JS, Laforest M, Page E, Kreiner JM, James T. Population Genomic Approaches for Weed Science. PLANTS (BASEL, SWITZERLAND) 2019; 8:E354. [PMID: 31546893 PMCID: PMC6783936 DOI: 10.3390/plants8090354] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 09/12/2019] [Accepted: 09/14/2019] [Indexed: 12/16/2022]
Abstract
Genomic approaches are opening avenues for understanding all aspects of biological life, especially as they begin to be applied to multiple individuals and populations. However, these approaches typically depend on the availability of a sequenced genome for the species of interest. While the number of genomes being sequenced is exploding, one group that has lagged behind are weeds. Although the power of genomic approaches for weed science has been recognized, what is needed to implement these approaches is unfamiliar to many weed scientists. In this review we attempt to address this problem by providing a primer on genome sequencing and provide examples of how genomics can help answer key questions in weed science such as: (1) Where do agricultural weeds come from; (2) what genes underlie herbicide resistance; and, more speculatively, (3) can we alter weed populations to make them easier to control? This review is intended as an introduction to orient weed scientists who are thinking about initiating genome sequencing projects to better understand weed populations, to highlight recent publications that illustrate the potential for these methods, and to provide direction to key tools and literature that will facilitate the development and execution of weed genomic projects.
Collapse
Affiliation(s)
- Sara L Martin
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Jean-Sebastien Parent
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Martin Laforest
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada.
| | - Eric Page
- Harrow Research and Development Centre, Agriculture and Agri-Food Canada, Harrow, ON N0R 1G0, Canada.
| | - Julia M Kreiner
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON M5S 3B2, Canada.
| | - Tracey James
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| |
Collapse
|