1
|
Daigle A, Whitehouse LS, Zhao R, Emerson JJ, Schrider DR. Leveraging long-read assemblies and machine learning to enhance short-read transposable element detection and genotyping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.11.637720. [PMID: 39990489 PMCID: PMC11844559 DOI: 10.1101/2025.02.11.637720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Transposable elements (TEs) are parasitic genomic elements that are ubiquitous across the tree of life and play a crucial role in genome evolution. Advances in long-read sequencing have allowed highly accurate TE detection, though at a higher cost than short-read sequencing. Recent studies using long reads have shown that existing short-read TE detection methods perform inadequately when applied to real data. In this study, we use a machine learning approach (called TEforest) to discover and genotype TE insertions and deletions with short-read data by using TEs detected from long-read genome assemblies as training data. Our method first uses a highly sensitive algorithm to discover potential TE insertion or deletion sites in the genome, extracting relevant features from short-read alignments. To discriminate between true and false TE insertions, we train a random forest model with a labeled ground-truth dataset for which we have calculated the same set of short-read features. We conduct a comprehensive benchmark of TEforest and traditional TE detection methods using real data, finding that TEforest identifies more true positives and fewer false positives across datasets with different read lengths and coverages, while also accurately inferring genotypes and the precise breakpoints of insertions. By learning short-read signatures of TEs previously only discoverable using long reads, our approach bridges the gap between large-scale population genetic studies and the accuracy of long-read assemblies. This work provides a user-friendly tool to study the prevalence and phenotypic effects of TE insertions across the genome.
Collapse
Affiliation(s)
- Austin Daigle
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599
| | - Logan S. Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599
| | - Roy Zhao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697
| | - JJ Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
2
|
Chen J, Garfinkel DJ, Bergman CM. Horizontal Transfer and Recombination Fuel Ty4 Retrotransposon Evolution in Saccharomyces. Genome Biol Evol 2025; 17:evaf004. [PMID: 39786570 PMCID: PMC11739139 DOI: 10.1093/gbe/evaf004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 09/26/2024] [Accepted: 12/27/2024] [Indexed: 01/12/2025] Open
Abstract
Horizontal transposon transfer (HTT) plays an important role in the evolution of eukaryotic genomes; however, the detailed evolutionary history and impact of most HTT events remain to be elucidated. To better understand the process of HTT in closely related microbial eukaryotes, we studied Ty4 retrotransposon subfamily content and sequence evolution across the genus Saccharomyces using short- and long-read whole genome sequence data, including new PacBio genome assemblies for two Saccharomyces mikatae strains. We find evidence for multiple independent HTT events introducing the Tsu4 subfamily into specific lineages of Saccharomyces paradoxus, Saccharomyces cerevisiae, Saccharomyces eubayanus, Saccharomyces kudriavzevii and the ancestor of the S. mikatae/Saccharomyces jurei species pair. In both S. mikatae and S. kudriavzevii, we identified novel Ty4 clades that were independently generated through recombination between resident and horizontally transferred subfamilies. Our results reveal that recurrent HTT and lineage-specific extinction events lead to a complex pattern of Ty4 subfamily content across the genus Saccharomyces. Moreover, our results demonstrate how HTT can lead to coexistence of related retrotransposon subfamilies in the same genome that can fuel evolution of new retrotransposon clades via recombination.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - David J Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Casey M Bergman
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
- Department of Genetics, University of Georgia, 120 E. Green St., Athens, GA, USA
| |
Collapse
|
3
|
Groza C, Chen X, Wheeler TJ, Bourque G, Goubert C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Nat Commun 2024; 15:8915. [PMID: 39414821 PMCID: PMC11484939 DOI: 10.1038/s41467-024-53294-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/02/2024] [Indexed: 10/18/2024] Open
Abstract
Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline to analyze polymorphic mobile elements insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies polymorphic mobile elements from genomic assemblies or long-read sequencing data, and genotypes these variants using short or long read sets. Benchmarking on simulated and real datasets reports high precision and recall rates. GraffiTE is designed to allow non-expert users to perform comprehensive analyses, including in models with limited transposable element knowledge and is compatible with various sequencing technologies. Here, we demonstrate the versatility of GraffiTE by analyzing human, Drosophila melanogaster, maize, and Cannabis sativa pangenome data. These analyses reveal the landscapes of polymorphic mobile elements and their frequency variations across individuals, strains, and cultivars.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | - Xun Chen
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| | - Guillaume Bourque
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
- Canadian Centre for Computational Genomics, McGill University, Montréal, QC, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada
- Human Genetics, McGill University, Montréal, QC, Canada
| | - Clément Goubert
- Human Genetics, McGill University, Montréal, QC, Canada.
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
4
|
Hannon-Hatfield JA, Chen J, Bergman CM, Garfinkel DJ. Evolution of a Restriction Factor by Domestication of a Yeast Retrotransposon. Mol Biol Evol 2024; 41:msae050. [PMID: 38442736 PMCID: PMC10951436 DOI: 10.1093/molbev/msae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/13/2024] [Accepted: 02/23/2024] [Indexed: 03/07/2024] Open
Abstract
Transposable elements drive genome evolution in all branches of life. Transposable element insertions are often deleterious to their hosts and necessitate evolution of control mechanisms to limit their spread. The long terminal repeat retrotransposon Ty1 prime (Ty1'), a subfamily of the Ty1 family, is present in many Saccharomyces cerevisiae strains, but little is known about what controls its copy number. Here, we provide evidence that a novel gene from an exapted Ty1' sequence, domesticated restriction of Ty1' relic 2 (DRT2), encodes a restriction factor that inhibits Ty1' movement. DRT2 arose through domestication of a Ty1' GAG gene and contains the C-terminal domain of capsid, which in the related Ty1 canonical subfamily functions as a self-encoded restriction factor. Bioinformatic analysis reveals the widespread nature of DRT2, its evolutionary history, and pronounced structural variation at the Ty1' relic 2 locus. Ty1' retromobility analyses demonstrate DRT2 restriction factor functionality, and northern blot and RNA-seq analysis indicate that DRT2 is transcribed in multiple strains. Velocity cosedimentation profiles indicate an association between Drt2 and Ty1' virus-like particles or assembly complexes. Chimeric Ty1' elements containing DRT2 retain retromobility, suggesting an ancestral role of productive Gag C-terminal domain of capsid functionality is present in the sequence. Unlike Ty1 canonical, Ty1' retromobility increases with copy number, suggesting that C-terminal domain of capsid-based restriction is not limited to the Ty1 canonical subfamily self-encoded restriction factor and drove the endogenization of DRT2. The discovery of an exapted Ty1' restriction factor provides insight into the evolution of the Ty1 family, evolutionary hot-spots, and host-transposable element interactions.
Collapse
Affiliation(s)
- J Adam Hannon-Hatfield
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Casey M Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - David J Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| |
Collapse
|
5
|
Chen J, Garfinkel DJ, Bergman CM. Horizontal transfer and recombination fuel Ty4 retrotransposon evolution in Saccharomyces. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572574. [PMID: 38187645 PMCID: PMC10769310 DOI: 10.1101/2023.12.20.572574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Horizontal transposon transfer (HTT) plays an important role in the evolution of eukaryotic genomes, however the detailed evolutionary history and impact of most HTT events remain to be elucidated. To better understand the process of HTT in closely-related microbial eukaryotes, we studied Ty4 retrotransposon subfamily content and sequence evolution across the genus Saccharomyces using short- and long-read whole genome sequence data, including new PacBio genome assemblies for two S. mikatae strains. We find evidence for multiple independent HTT events introducing the Tsu4 subfamily into specific lineages of S. paradoxus, S. cerevisiae, S. eubayanus, S. kudriavzevii and the ancestor of the S. mikatae/S. jurei species pair. In both S. mikatae and S. kudriavzevii, we identified novel Ty4 clades that were independently generated through recombination between resident and horizontally-transferred subfamilies. Our results reveal that recurrent HTT and lineage-specific extinction events lead to a complex pattern of Ty4 subfamily content across the genus Saccharomyces. Moreover, our results demonstrate how HTT can lead to coexistence of related retrotransposon subfamilies in the same genome that can fuel evolution of new retrotransposon clades via recombination.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| |
Collapse
|
6
|
Wang Y, McNeil P, Abdulazeez R, Pascual M, Johnston SE, Keightley PD, Obbard DJ. Variation in mutation, recombination, and transposition rates in Drosophila melanogaster and Drosophila simulans. Genome Res 2023; 33:587-598. [PMID: 37037625 PMCID: PMC10234296 DOI: 10.1101/gr.277383.122] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/28/2023] [Indexed: 04/12/2023]
Abstract
The rates of mutation, recombination, and transposition are core parameters in models of evolution. They impact genetic diversity, responses to ongoing selection, and levels of genetic load. However, even for key evolutionary model species such as Drosophila melanogaster and Drosophila simulans, few estimates of these parameters are available, and we have little idea of how rates vary between individuals, sexes, or populations. Knowledge of this variation is fundamental for parameterizing models of genome evolution. Here, we provide direct estimates of mutation, recombination, and transposition rates and their variation in a West African and a European population of D. melanogaster and a European population of D. simulans Across 89 flies, we observe 58 single-nucleotide mutations, 286 crossovers, and 89 transposable element (TE) insertions. Compared to the European D. melanogaster, we find the West African population has a lower mutation rate (1.67 × 10-9 site-1 gen-1 vs. 4.86 × 10-9 site-1 gen-1) and a lower transposition rate (8.99 × 10-5 copy-1 gen-1 vs. 23.36 × 10-5 copy-1 gen-1), but a higher recombination rate (3.44 cM/Mb vs. 2.06 cM/Mb). The European D. simulans population has a similar mutation rate to European D. melanogaster, but a significantly higher recombination rate and a lower, but not significantly different, transposition rate. Overall, we find paternal-derived mutations are more frequent than maternal ones in both species. Our study quantifies the variation in rates of mutation, recombination, and transposition among different populations and sexes, and our direct estimates of these parameters in D. melanogaster and D. simulans will benefit future studies in population and evolutionary genetics.
Collapse
Affiliation(s)
- Yiguan Wang
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom;
| | - Paul McNeil
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | | | - Marta Pascual
- Departament de Genètica, Microbiologia i Estadística and IRBio, Universitat de Barcelona, 08028 Barcelona, Spain
| | - Susan E Johnston
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Peter D Keightley
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Darren J Obbard
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| |
Collapse
|