1
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024:10.1038/s41576-024-00738-6. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
2
|
Kemppainen P, Schembri R, Momigliano P. Boundary Effects Cause False Signals of Range Expansions in Population Genomic Data. Mol Biol Evol 2024; 41:msae091. [PMID: 38743590 PMCID: PMC11135943 DOI: 10.1093/molbev/msae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/25/2024] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
Studying range expansions is central for understanding genetic variation through space and time as well as for identifying refugia and biological invasions. Range expansions are characterized by serial founder events causing clines of decreasing genetic diversity away from the center of origin and asymmetries in the two-dimensional allele frequency spectra. These asymmetries, summarized by the directionality index (ψ), are sensitive to range expansions and persist for longer than clines in genetic diversity. In continuous and finite meta-populations, genetic drift tends to be stronger at the edges of the species distribution in equilibrium populations and populations undergoing range expansions alike. Such boundary effects are expected to affect geographic patterns in genetic diversity and ψ. Here we demonstrate that boundary effects cause high false positive rates in equilibrium meta-populations when testing for range expansions. In the simulations, the absolute value of ψ (|ψ|) in equilibrium data sets was proportional to the fixation index (FST). By fitting signatures of range expansions as a function of ɛ |ψ|/FST and geographic clines in ψ, strong evidence for range expansions could be detected in data from a recent rapid invasion of the cane toad, Rhinella marina, in Australia, but not in 28 previously published empirical data sets from Australian scincid lizards that were significant for the standard range expansion tests. Thus, while clinal variation in ψ is still the most sensitive statistic to range expansions, to detect true signatures of range expansions in natural populations, its magnitude needs to be considered in relation to the overall levels of genetic structuring in the data.
Collapse
Affiliation(s)
- Petri Kemppainen
- School of Biological Sciences and Swire Institute of Marine Science, Faculty of Science, The University of Hong Kong, Hong Kong, SAR, People's Republic of China
| | - Rhiannon Schembri
- School of Natural Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, Australia
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | - Paolo Momigliano
- School of Biological Sciences and Swire Institute of Marine Science, Faculty of Science, The University of Hong Kong, Hong Kong, SAR, People's Republic of China
| |
Collapse
|
3
|
Springer AL, Gompert Z. Considerable genetic diversity and structure despite narrow endemism and limited ecological specialization in the Hayden's ringlet, Coenonympha haydenii. Mol Ecol 2024; 33:e17310. [PMID: 38441401 DOI: 10.1111/mec.17310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/26/2023] [Accepted: 02/15/2024] [Indexed: 03/26/2024]
Abstract
Understanding the processes that underlie the development of population genetic structure is central to the study of evolution. Patterns of genetic structure, in turn, can reveal signatures of isolation by distance (IBD), barriers to gene flow, or even the genesis of speciation. However, it is unclear how severe range restriction might impact the processes that dominate the development of genetic structure. In narrow endemic species, is population structure likely to be adaptive in nature, or rather the result of genetic drift? In this study, we investigated patterns of genetic diversity and structure in the narrow endemic Hayden's ringlet butterfly. Specifically, we asked to what degree genetic structure in the Hayden's ringlet can be explained by IBD, isolation by resistance (IBR) (in the form of geographic or ecological barriers to migration between populations), and isolation by environment (in the form of differences in host plant availability and preference). We employed a genotyping-by-sequencing (GBS) approach coupled with host preference assays, Bayesian modelling, and population genomic analyses to answer these questions. Our results suggest that despite their restricted range, levels of genetic diversity in the Hayden's ringlet are comparable to those seen in more widespread butterfly species. Hayden's ringlets showed a strong preference for feeding on grasses relative to sedges, but neither larval preference nor potential host availability at sampling sites correlated with genetic structure. We conclude that geography, in the form of IBR and simple IBD, was the major driver of contemporary patterns of differentiation in this narrow endemic species.
Collapse
Affiliation(s)
- Amy L Springer
- Department of Biology, Utah State University, Logan, Utah, USA
| | - Zachariah Gompert
- Department of Biology, Utah State University, Logan, Utah, USA
- Ecology Center, Utah State University, Logan, Utah, USA
| |
Collapse
|
4
|
Dallaire X, Bouchard R, Hénault P, Ulmo-Diaz G, Normandeau E, Mérot C, Bernatchez L, Moore JS. Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication. Genome Biol Evol 2023; 15:evad229. [PMID: 38085037 PMCID: PMC10752349 DOI: 10.1093/gbe/evad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2023] [Indexed: 12/28/2023] Open
Abstract
Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing (WGS) data with low (1.5×) to intermediate (4.8×) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22% to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths.
Collapse
Affiliation(s)
- Xavier Dallaire
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
| | - Raphael Bouchard
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Philippe Hénault
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Gabriela Ulmo-Diaz
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Eric Normandeau
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
- Plateforme de bio-informatique de l’IBIS, Université Laval, Québec, Canada
| | - Claire Mérot
- CNRS, UMR 6553 ECOBIO, Université de Rennes, Rennes, France
| | - Louis Bernatchez
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| | - Jean-Sébastien Moore
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- Centre d'Études Nordiques, Université Laval, Québec, Canada
- Ressources Aquatique Québec, Université de Rimouski, Rimouski, Canada
| |
Collapse
|
5
|
Freedman MG, Kronforst MR. Migration genetics take flight: genetic and genomic insights into monarch butterfly migration. CURRENT OPINION IN INSECT SCIENCE 2023; 59:101079. [PMID: 37385346 PMCID: PMC10592233 DOI: 10.1016/j.cois.2023.101079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/20/2023] [Accepted: 06/23/2023] [Indexed: 07/01/2023]
Abstract
Monarch butterflies have emerged as a model system in migration genetics. Despite inherent challenges associated with studying the integrative phenotypes that characterize migration, recent research has highlighted genes and transcriptional networks underlying aspects of the monarch's migratory syndrome. Circadian clock genes and the vitamin A synthesis pathway regulate reproductive diapause initiation, while diapause termination appears to involve calcium and insulin signaling. Comparative approaches have highlighted genes that distinguish migratory and nonmigratory monarch populations, as well as genes associated with natural variation in propensity to initiate diapause. Population genetic techniques demonstrate that seasonal migration can collapse patterns of spatial structure at continental scales, whereas loss of migration can drive differentiation between even nearby populations. Finally, population genetics can be applied to reconstruct the monarch's evolutionary history and search for contemporary demographic changes, which can provide relevant context for understanding recently observed declines in overwintering North American monarch numbers.
Collapse
|
6
|
Hopper KR. Reduced-representation libraries in insect genetics. CURRENT OPINION IN INSECT SCIENCE 2023; 59:101084. [PMID: 37442341 DOI: 10.1016/j.cois.2023.101084] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 05/04/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023]
Abstract
Genotyping-by-sequencing of reduced-representation libraries has ushered in an era where genome-wide data can be gotten for any species. Here, I review research on this topic during the last two years, report meta-analysis of the results, and discuss analysis methods and issues. Scanning the literature from 2021 to 2022 identified 21 papers, the majority of which were on population differences, including local adaptation and migration, but several papers were on genetic maps and their use in assembly scaffolding or analysis of quantitative trait loci, on the origin of incursions of pest insects, or on infection rates of a pathogen in a disease vector. The research reviewed includes 33 species from 25 families and 11 orders. Meta-analysis showed that less than 16%, and most often, less than 1% of the genome was implicated in local adaptation and that the number of adaptive loci correlated with genetic divergence among populations.
Collapse
Affiliation(s)
- Keith R Hopper
- Beneficial Insect Introductions Research Unit, ARS, USDA, Newark, DE, United States.
| |
Collapse
|
7
|
Steele C, Ragonese IG, Majewska AA. Extent and impacts of winter breeding in the North American monarch butterfly. CURRENT OPINION IN INSECT SCIENCE 2023; 59:101077. [PMID: 37336490 DOI: 10.1016/j.cois.2023.101077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 06/09/2023] [Accepted: 06/13/2023] [Indexed: 06/21/2023]
Abstract
Since the 1960s, scientists have observed the North American monarch butterfly (Danaus plexippus) continuing reproductive activities past the fall migration and into the winter months when the climate is mild. Recent work suggests that small populations of winter breeding monarchs are present in western and southeastern USA, as well as northwestern Mexico, with new winter breeding populations forming in areas where non-native milkweeds are planted. The year-round presence of milkweed plants and temperatures suitable for immature monarch development are vital factors allowing for winter breeding. Non-native milkweeds, in conjunction with novel barriers to migration, are likely contributing to the rise in winter breeding behavior. Warmer climates are already impacting milkweed phenology and range, possibly favoring winter breeding behavior. Similar pressures but different implications are expected for eastern and western winter breeding monarchs given the differences in the migration ecology, milkweed species, and climate changes in the two regions.
Collapse
Affiliation(s)
- Christen Steele
- Department of Ecology and Evolutionary Biology, Tulane University, 1430 Annunciation St, New Orleans, LA 70130, USA
| | - Isabella G Ragonese
- Odum School of Ecology, University of Georgia, 140 E Green Street, Athens, GA 30602, USA
| | - Ania A Majewska
- Department of Physiology and Pharmacology College of Veterinary Medicine, University of Georgia, 501 D.W. Brooks Drive, Athens, GA 30602, USA.
| |
Collapse
|
8
|
Yang LH. Complexity, humility, and action: a current perspective on monarchs in Western North America. CURRENT OPINION IN INSECT SCIENCE 2023; 59:101078. [PMID: 37380104 DOI: 10.1016/j.cois.2023.101078] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 06/16/2023] [Accepted: 06/19/2023] [Indexed: 06/30/2023]
Abstract
Recent studies have continued to shed light on the ecology of monarch butterflies (Danaus plexippus) in western North America. These studies have documented a declining overwintering population over several decades, punctuated by unexpected variability in recent years. Understanding this variability will require grappling with the spatial and temporal heterogeneity of resources and risks presented to western monarchs throughout their annual life cycle. Recent changes in the western monarch population further illustrate how interacting global change drivers can create complex causes and consequences in this system. The complexity of this system should inspire humility. However, even recognizing the limits of our current understanding, there is enough scientific common ground to take some conservation actions now.
Collapse
Affiliation(s)
- Louie H Yang
- Department of Entomology and Nematology, University of California, Davis, CA 95616, USA.
| |
Collapse
|
9
|
Boyle JH, Strickler S, Twyford AD, Ricono A, Powell A, Zhang J, Xu H, Smith R, Dalgleish HJ, Jander G, Agrawal AA, Puzey JR. Temporal matches between monarch butterfly and milkweed population changes over the past 25,000 years. Curr Biol 2023; 33:3702-3710.e5. [PMID: 37607548 DOI: 10.1016/j.cub.2023.07.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 04/13/2023] [Accepted: 07/26/2023] [Indexed: 08/24/2023]
Abstract
In intimate ecological interactions, the interdependency of species may result in correlated demographic histories. For species of conservation concern, understanding the long-term dynamics of such interactions may shed light on the drivers of population decline. Here, we address the demographic history of the monarch butterfly, Danaus plexippus, and its dominant host plant, the common milkweed Asclepias syriaca (A. syriaca), using broad-scale sampling and genomic inference. Because genetic resources for milkweed have lagged behind those for monarchs, we first release a chromosome-level genome assembly and annotation for common milkweed. Next, we show that despite its enormous geographic range across eastern North America, A. syriaca is best characterized as a single, roughly panmictic population. Using approximate Bayesian computation with random forests (ABC-RF), a machine learning method for reconstructing demographic histories, we show that both monarchs and milkweed experienced population expansion during the most recent recession of North American glaciers 10,000-20,000 years ago. Our data also identify concurrent population expansions in both species during the large-scale clearing of eastern forests (∼200 years ago). Finally, we find no evidence that either species experienced a reduction in effective population size over the past 75 years. Thus, the well-documented decline of monarch abundance over the past 40 years is not visible in our genomic dataset, reflecting a possible mismatch of the overwintering census population to effective population size in this species.
Collapse
Affiliation(s)
- John H Boyle
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA; Biology Department, University of Mary, 7500 University Dr., Bismarck, ND 58504, USA
| | - Susan Strickler
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA; Chicago Botanic Garden, Plant Science and Conservation, 1000 Lake Cook Rd., Glencoe, IL 60022, USA; Northwestern University, Plant Biology and Conservation Program, 2145 Sheridan Rd., Evanston, IL 60208, USA
| | - Alex D Twyford
- Institute of Ecology and Evolution, University of Edinburgh, Charlotte Auerbach Rd., Edinburgh EH9 3FL, UK; Royal Botanic Garden Edinburgh, Edinburgh EH3 5NZ, UK
| | - Angela Ricono
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Adrian Powell
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Jing Zhang
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Hongxing Xu
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA; College of Life Sciences, Shaanxi Normal University, South Chang'an Rd., Xi'an 710062, China
| | - Ronald Smith
- Data Science Program, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Harmony J Dalgleish
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Georg Jander
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Anurag A Agrawal
- Department of Ecology and Evolutionary Biology, Cornell University, Corson Hall, Ithaca, NY 14853, USA
| | - Joshua R Puzey
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA.
| |
Collapse
|