1
|
Fischer D, Tapio M, Bitz O, Iso-Touru T, Kause A, Tapio I. Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species. BMC Genomics 2025; 26:111. [PMID: 39910437 PMCID: PMC11796084 DOI: 10.1186/s12864-025-11296-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 01/27/2025] [Indexed: 02/07/2025] Open
Abstract
BACKGROUND Diversifying animal cultivation demands efficient genotyping for enabling genomic selection, but non-model species lack efficient genotyping solutions. The aim of this study was to optimize a genotyping-by-sequencing (GBS) double-digest RAD-sequencing (ddRAD) pipeline. Bovine data was used to automate the bioinformatic analysis. The application of the optimization was demonstrated on non-model European whitefish data. RESULTS DdRAD data generation was designed for a reliable estimation of relatedness and is scalable to up to 384 samples. The GBS sequencing yielded approximately one million reads for each of the around 100 assessed samples. Optimizing various strategies to create a de-novo reference genome for variant calling (mock reference) showed that using three samples outperformed other building strategies with single or very large number of samples. Adjustments to most pipeline tuning parameters had limited impact on high-quality data, except for the identity criterion for merging mock reference genome clusters. For each species, over 15k GBS variants based on the mock reference were obtained and showed comparable results with the ones called using an existing reference genome. Repeatability analysis showed high concordance over replicates, particularly in bovine while in European whitefish data repeatability did not exceed earlier observations. CONCLUSIONS The proposed cost-effective ddRAD strategy, coupled with an efficient bioinformatics workflow, enables broad adoption of ddRAD GBS across diverse farmed species. While beneficial, a reference genome is not obligatory. The integration of Snakemake streamlines the pipeline usage on computer clusters and supports customization. This user-friendly solution facilitates genotyping for both model and non-model species.
Collapse
Affiliation(s)
- Daniel Fischer
- Applied Statistical Methods, Natural Resources, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland.
| | - Miika Tapio
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland
| | - Oliver Bitz
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland
| | - Terhi Iso-Touru
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland
| | - Antti Kause
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland
| | - Ilma Tapio
- Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, 31600, Finland
| |
Collapse
|
2
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024; 25:750-767. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
3
|
de Oliveira JCP, Cabanne GS, Santos FR. Phylogenomics of the gray-breasted sabrewing (Campylopterus largipennis) species complex in the Amazonia and Cerrado biomes. Genet Mol Biol 2024; 47:e20230331. [PMID: 39133262 PMCID: PMC11308382 DOI: 10.1590/1678-4685-gmb-2023-0331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/29/2024] [Indexed: 08/13/2024] Open
Abstract
The Neotropics are one of the most biodiverse regions of the world, where environmental dynamics, climate and geology resulted in a complex diversity of fauna and flora. In such complex and heterogeneous environments, widely distributed species require deep investigation about their biogeographic history. The gray-breasted sabrewing hummingbird Campylopterus largipennis is a species complex that occurs in forest and open ecosystems of South America, including also high-altitude grasslands. It has been recently split into four distinct species distributed in Amazonia (rainforest) and Cerrado (savanna) biomes with boundaries marked by ecological barriers. Here, we investigated the evolutionary dynamics of population lineages within this neotropical taxon to elucidate its biogeographical history and current lineage diversity. We used a reduced-representation sequencing approach to perform fine-scale population genomic analyses of samples distributed throughout Amazonia and Cerrado localities, representing all four recently recognized species. We found a deep genetic structure separating species from both biomes, and a more recent divergence between species within each biome and from distinct habitats. The population dynamics through time was shown to be concordant with known vicariant events, isolation by distance, and altitudinal breaks, where the Amazon River and the Espinhaço Mountain Range worked as important barriers associated to speciation.
Collapse
Affiliation(s)
- Jean Carlo Pedroso de Oliveira
- Universidade Federal de Minas Gerais, Instituto de Ciências
Biológicas, Departamento de Genética, Ecologia e Evolução, Belo Horizonte, MG,
Brazil
| | - Gustavo Sebastián Cabanne
- División de Ornitología, Museo Argentino de Ciencias Naturales
“Bernardino Rivadavia” (MACN - CONICET), Buenos Aires, Argentina
| | - Fabrício Rodrigues Santos
- Universidade Federal de Minas Gerais, Instituto de Ciências
Biológicas, Departamento de Genética, Ecologia e Evolução, Belo Horizonte, MG,
Brazil
| |
Collapse
|
4
|
Doublet M, Degalez F, Lagarrigue S, Lagoutte L, Gueret E, Allais S, Lecerf F. Variant calling and genotyping accuracy of ddRAD-seq: Comparison with 20X WGS in layers. PLoS One 2024; 19:e0298565. [PMID: 39058708 PMCID: PMC11280156 DOI: 10.1371/journal.pone.0298565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 05/23/2024] [Indexed: 07/28/2024] Open
Abstract
Whole Genome Sequencing (WGS) remains a costly or unsuitable method for routine genotyping of laying hens. Until now, breeding companies have been using or developing SNP chips. Nevertheless, alternatives methods based on sequencing have been developed. Among these, reduced representation sequencing approaches can offer sequencing quality and cost-effectiveness by reducing the genomic regions covered by sequencing. The aim of this study was to evaluate the ability of double digested Restriction site Associated DNA sequencing (ddRAD-seq) to identify and genotype SNPs in laying hens, by comparison with a presumed reliable WGS approach. Firstly, the sensitivity and precision of variant calling and the genotyping reliability of ddRADseq were determined. Next, the SNP Call Rate (CRSNP) and mean depth of sequencing per SNP (DPSNP) were compared between both methods. Finally, the effect of multiple combinations of thresholds for these parameters on genotyping reliability and amount of remaining SNPs in ddRAD-seq was studied. In raw form, the ddRAD-seq identified 349,497 SNPs evenly distributed on the genome with a CRSNP of 0.55, a DPSNP of 11X and a mean genotyping reliability rate per SNP of 80%. Considering genomic regions covered by expected enzymatic fragments (EFs), the sensitivity of the ddRAD-seq was estimated at 32.4% and its precision at 96.4%. The low CRSNP and DPSNP values were explained by the detection of SNPs outside the EFs theoretically generated by the ddRAD-seq protocol. Indeed, SNPs outside the EFs had significantly lower CRSNP (0.25) and DPSNP (1X) values than SNPs within the EFs (0.7 and 17X, resp.). The study demonstrated the relationship between CRSNP, DPSNP, genotyping reliability and the number of SNPs retained, to provide a decision-support tool for defining filtration thresholds. Severe quality control over ddRAD-seq data allowed to retain a minimum of 40% of the SNPs with a CcR of 98%. Then, ddRAD-seq was defined as a suitable method for variant calling and genotyping in layers.
Collapse
Affiliation(s)
| | | | | | | | - Elise Gueret
- MGX-Montpellier GenomiX, Univ. Montpellier, CNRS, INSERM, Montpellier, France
| | | | | |
Collapse
|
5
|
Koontz AC, Schumacher EK, Spence ES, Hoban SM. Ex situ conservation of two rare oak species using microsatellite and SNP markers. Evol Appl 2024; 17:e13650. [PMID: 38524684 PMCID: PMC10960078 DOI: 10.1111/eva.13650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/27/2023] [Accepted: 01/14/2024] [Indexed: 03/26/2024] Open
Abstract
Plant collections held by botanic gardens and arboreta are key components of ex situ conservation. Maintaining genetic diversity in such collections allows them to be used as resources for supplementing wild populations. However, most recommended minimum sample sizes for sufficient ex situ genetic diversity are based on microsatellite markers, and it remains unknown whether these sample sizes remain valid in light of more recently developed next-generation sequencing (NGS) approaches. To address this knowledge gap, we examine how ex situ conservation status and sampling recommendations differ when derived from microsatellites and single nucleotide polymorphisms (SNPs) in garden and wild samples of two threatened oak species. For Quercus acerifolia, SNPs show lower ex situ representation of wild allelic diversity and slightly lower minimum sample size estimates than microsatellites, while results for each marker are largely similar for Q. boyntonii. The application of missing data filters tends to lead to higher ex situ representation, while the impact of different SNP calling approaches is dependent on the species being analyzed. Measures of population differentiation within species are broadly similar between markers, but larger numbers of SNP loci allow for greater resolution of population structure and clearer assignment of ex situ individuals to wild source populations. Our results offer guidance for future ex situ conservation assessments utilizing SNP data, such as the application of missing data filters and the usage of a reference genome, and illustrate that both microsatellites and SNPs remain viable options for botanic gardens and arboreta seeking to ensure the genetic diversity of their collections.
Collapse
Affiliation(s)
| | | | - Emma S. Spence
- Morton ArboretumCenter for Tree ScienceLisleIllinoisUSA
- Cornell UniversityDepartment of Public and Ecosystem HealthIthacaNew YorkUSA
| | - Sean M. Hoban
- Morton ArboretumCenter for Tree ScienceLisleIllinoisUSA
| |
Collapse
|
6
|
Pearman WS, Urban L, Alexander A. Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data. Mol Ecol Resour 2022; 22:2599-2613. [PMID: 35593534 PMCID: PMC9541430 DOI: 10.1111/1755-0998.13646] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 05/13/2022] [Indexed: 11/29/2022]
Abstract
Reduced representation sequencing (RRS) is a widely used method to assay the diversity of genetic loci across the genome of an organism. The dominant class of RRS approaches assay loci associated with restriction sites within the genome (restriction site associated DNA sequencing, or RADseq). RADseq is frequently applied to non‐model organisms since it enables population genetic studies without relying on well‐characterized reference genomes. However, RADseq requires the use of many bioinformatic filters to ensure the quality of genotyping calls. These filters can have direct impacts on population genetic inference, and therefore require careful consideration. One widely used filtering approach is the removal of loci that do not conform to expectations of Hardy–Weinberg equilibrium (HWE). Despite being widely used, we show that this filtering approach is rarely described in sufficient detail to enable replication. Furthermore, through analyses of in silico and empirical data sets we show that some of the most widely used HWE filtering approaches dramatically impact inference of population structure. In particular, the removal of loci exhibiting departures from HWE after pooling across samples significantly reduces the degree of inferred population structure within a data set (despite this approach being widely used). Based on these results, we provide recommendations for best practice regarding the implementation of HWE filtering for RADseq data sets.
Collapse
Affiliation(s)
- William S Pearman
- Department of Marine Science, University of Otago, Dunedin, New Zealand.,Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Lara Urban
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Alana Alexander
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| |
Collapse
|
7
|
Liu H, Mullan D, Zhao S, Zhang Y, Ye J, Wang Y, Zhang A, Zhao X, Liu G, Zhang C, Chan K, Lu Z, Yan G. Genomic regions controlling yield-related traits in spring wheat: a mini review and a case study for rainfed environments in Australia and China. Genomics 2022; 114:110268. [PMID: 35065191 DOI: 10.1016/j.ygeno.2022.110268] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 01/11/2022] [Accepted: 01/15/2022] [Indexed: 01/17/2023]
Abstract
A genome-wide association study (GWAS) was performed in six environments to identify major or consistent alleles responsible for wheat yield traits in Australia and North China where rainfed farming system is adopted. A panel of 228 spring wheat varieties were genotyped by double digest restriction-site associated DNA genotyping-by-sequencing. A total of 223 significant marker-trait association (MTAs) and 46 candidate genes for large- or consistent-effect MTAs were identified. The results were compared with previous studies based on a mini-review of 23 GWAS analyses on wheat yield. A phenomenon seldom reported in previous studies was that MTAs responsible for the trait tended to cluster together at certain chromosome segments, and many candidate genes were in the form of gene clusters. Although linkage disequilibrium (LD) might contribute to the co-segregation of the regions, it also suggested that marker-assisted selection (MAS) or transgenic method targeting a single gene might not be as effective as MAS targeting a larger genomic region where all the genes or gene clusters underlying play important roles.
Collapse
Affiliation(s)
- Hui Liu
- UWA School of Agriculture and Environment and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia.
| | | | - Shancen Zhao
- Beijing Genomics Institute, Shenzhen 518053, China.
| | - Yong Zhang
- Institute of Crop Science, Chinese Academy of Agriculture Sciences, Beijing 100081, China.
| | - Jun Ye
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences / Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control / Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010070, China; College of Agronomy, Hebei Agricultural University, State Key Laboratory of North China Crop Improvement and Regulation / Key Laboratory of Crop Growth Regulation of Hebei Province, Baoding 071001, China
| | - Yong Wang
- Wheat Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou 730070, China
| | - Aimin Zhang
- College of Agronomy, Hebei Agricultural University, State Key Laboratory of North China Crop Improvement and Regulation / Key Laboratory of Crop Growth Regulation of Hebei Province, Baoding 071001, China.
| | - Xiaoqing Zhao
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences / Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control / Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010070, China
| | - Guannan Liu
- UWA School of Agriculture and Environment and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia.
| | - Chi Zhang
- Beijing Genomics Institute, Shenzhen 518053, China.
| | - Kenneth Chan
- Australian Genome Research Facility, Melbourne, Vic 3000, Australia.
| | - Zhanyuan Lu
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences / Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control / Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010070, China.
| | - Guijun Yan
- UWA School of Agriculture and Environment and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia.
| |
Collapse
|
8
|
Thom G, Gehara M, Smith BT, Miyaki CY, do Amaral FR. Microevolutionary dynamics show tropical valleys are deeper for montane birds of the Atlantic Forest. Nat Commun 2021; 12:6269. [PMID: 34725329 PMCID: PMC8560783 DOI: 10.1038/s41467-021-26537-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 10/08/2021] [Indexed: 11/18/2022] Open
Abstract
Tropical mountains hold more biodiversity than their temperate counterparts, and this disparity is often associated with the latitudinal climatic gradient. However, distinguishing the impact of latitude versus the background effects of species history and traits is challenging due to the evolutionary distance between tropical and temperate assemblages. Here, we test whether microevolutionary processes are linked to environmental variation across a sharp latitudinal transition in 21 montane birds of the southern Atlantic Forest in Brazil. We find that effective dispersal within populations in the tropical mountains is lower and genomic differentiation is better predicted by the current environmental complexity of the region than within the subtropical populations. The concordant response of multiple co-occurring populations is consistent with spatial climatic variability as a major process driving population differentiation. Our results provide evidence for how a narrow latitudinal gradient can shape microevolutionary processes and contribute to broader scale biodiversity patterns. There are many hypotheses for why the tropics are more biodiverse than higher latitudes. Phylogenomic analyses of 21 montane birds finds that tropical birds disperse less and have more genetically structured populations than their counterparts at higher latitudes, possibly due to a larger elevational climate gradient in the tropics
Collapse
Affiliation(s)
- Gregory Thom
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA. .,Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, Rua do Matão, 277, Cidade Universitária, São Paulo, SP, 05508-090, Brazil.
| | - Marcelo Gehara
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024, USA.,Department of Earth and Environmental Sciences, Rutgers University Newark, 195 University Ave, Newark, NJ, 07102, USA
| | - Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA
| | - Cristina Y Miyaki
- Departamento de Genética e Biologia Evolutiva, Universidade de São Paulo, Rua do Matão, 277, Cidade Universitária, São Paulo, SP, 05508-090, Brazil
| | - Fábio Raposo do Amaral
- Departamento de Ecologia e Biologia Evolutiva, Universidade Federal de São Paulo, Rua Prof. Artur Riedel, 275, Jardim Eldorado, Diadema, SP, CEP 09972-270, Brazil
| |
Collapse
|
9
|
He JC, Li SY, He WZ, Xian JJ, Ma XY, Wang YC, Zhang MC, Ye GX, Liang B, Xia Q, Li Q. Application of Restriction Site-Associated DNA Sequencing (RAD-Seq) for Copy Number Variation and Triploidy Detection in Human. Cytogenet Genome Res 2021; 161:406-413. [PMID: 34657031 DOI: 10.1159/000518930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/06/2021] [Indexed: 11/19/2022] Open
Abstract
At present, low-pass whole-genome sequencing (WGS) is frequently used in clinical research and in the screening of copy number variations (CNVs). However, there are still some challenges in the detection of triploids. Restriction site-associated DNA sequencing (RAD-Seq) technology is a reduced-representation genome sequencing technology developed based on next-generation sequencing. Here, we verified whether RAD-Seq could be employed to detect CNVs and triploids. In this study, genomic DNA of 11 samples was extracted employing a routine method and used to build libraries. Five cell lines of known karyotypes and 6 triploid abortion tissue samples were included for RAD-Seq testing. The triploid samples were confirmed by STR analysis and also tested by low-pass WGS. The accuracy and efficiency of detecting CNVs and triploids by RAD-Seq were then assessed, compared with low-pass WGS. In our results, RAD-Seq detected 11 out of 11 (100%) chromosomal abnormalities, including 4 deletions and 1 aneuploidy in the purchased cell lines and all triploid samples. By contrast, these triploids were missed by low-pass WGS. Furthermore, RAD-Seq showed a higher resolution and more accurate allele frequency in the detection of triploids than low-pass WGS. Our study shows that, compared with low-pass WGS, RAD-Seq has relatively higher accuracy in CNV detection at a similar cost and is capable of identifying triploids. Therefore, the application of this technique in medical genetics has a significant potential value.
Collapse
Affiliation(s)
- Jian-Chun He
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shao-Ying Li
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Wen-Zhi He
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jia-Jia Xian
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xiao-Yan Ma
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yan-Chao Wang
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Min-Cong Zhang
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Guo-Xin Ye
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Bo Liang
- Basecare Medical Device Co., Ltd, Suzhou, China
| | - Qin Xia
- Basecare Medical Device Co., Ltd, Suzhou, China,
| | - Qing Li
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
10
|
Nazareno AG, Knowles LL. There Is No 'Rule of Thumb': Genomic Filter Settings for a Small Plant Population to Obtain Unbiased Gene Flow Estimates. FRONTIERS IN PLANT SCIENCE 2021; 12:677009. [PMID: 34721447 PMCID: PMC8551369 DOI: 10.3389/fpls.2021.677009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 06/16/2021] [Indexed: 06/13/2023]
Abstract
The application of high-density polymorphic single-nucleotide polymorphisms (SNP) markers derived from high-throughput sequencing methods has heralded plenty of biological questions about the linkages of processes operating at micro- and macroevolutionary scales. However, the effects of SNP filtering practices on population genetic inference have received much less attention. By performing sensitivity analyses, we empirically investigated how decisions about the percentage of missing data (MD) and the minor allele frequency (MAF) set in bioinformatic processing of genomic data affect direct (i.e., parentage analysis) and indirect (i.e., fine-scale spatial genetic structure - SGS) gene flow estimates. We focus specifically on these manifestations in small plant populations, and particularly, in the rare tropical plant species Dinizia jueirana-facao, where assumptions implicit to analytical procedures for accurate estimates of gene flow may not hold. Avoiding biases in dispersal estimates are essential given this species is facing extinction risks due to habitat loss, and so we also investigate the effects of forest fragmentation on the accuracy of dispersal estimates under different filtering criteria by testing for recent decrease in the scale of gene flow. Our sensitivity analyses demonstrate that gene flow estimates are robust to different setting of MAF (0.05-0.35) and MD (0-20%). Comparing the direct and indirect estimates of dispersal, we find that contemporary estimates of gene dispersal distance (σ r t = 41.8 m) was ∼ fourfold smaller than the historical estimates, supporting the hypothesis of a temporal shift in the scale of gene flow in D. jueirana-facao, which is consistent with predictions based on recent, dramatic forest fragmentation process. While we identified settings for filtering genomic data to avoid biases in gene flow estimates, we stress that there is no 'rule of thumb' for bioinformatic filtering and that relying on default program settings is not advisable. Instead, we suggest that the approach implemented here be applied independently in each separate empirical study to confirm appropriate settings to obtain unbiased population genetics estimates.
Collapse
Affiliation(s)
- Alison G. Nazareno
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States
- Department of Genetics, Ecology and Evolution, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|