1
|
Questioning inbreeding: Could outbreeding affect productivity in the North African catfish in Thailand? PLoS One 2024; 19:e0302584. [PMID: 38709757 PMCID: PMC11073742 DOI: 10.1371/journal.pone.0302584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
The North African catfish (Clarias gariepinus) is a significant species in aquaculture, which is crucial for ensuring food and nutrition security. Their high adaptability to diverse environments has led to an increase in the number of farms that are available for their production. However, long-term closed breeding adversely affects their reproductive performance, leading to a decrease in production efficiency. This is possibly caused by inbreeding depression. To investigate the root cause of this issue, the genetic diversity of captive North African catfish populations was assessed in this study. Microsatellite genotyping and mitochondrial DNA D-loop sequencing were applied to 136 catfish specimens, collected from three populations captured for breeding in Thailand. Interestingly, extremely low inbreeding coefficients were obtained within each population, and distinct genetic diversity was observed among the three populations, indicating that their genetic origins are markedly different. This suggests that outbreeding depression by genetic admixture among currently captured populations of different origins may account for the low productivity of the North African catfish in Thailand. Genetic improvement of the North African catfish populations is required by introducing new populations whose origins are clearly known. This strategy should be systematically integrated into breeding programs to establish an ideal founder stock for selective breeding.
Collapse
|
2
|
RIDGE, a tool tailored to detect gene flow barriers across species pairs. Mol Ecol Resour 2024; 24:e13944. [PMID: 38419376 DOI: 10.1111/1755-0998.13944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/19/2024] [Accepted: 02/05/2024] [Indexed: 03/02/2024]
Abstract
Characterizing the processes underlying reproductive isolation between diverging lineages is central to understanding speciation. Here, we present RIDGE-Reproductive Isolation Detection using Genomic polymorphisms-a tool tailored for quantifying gene flow barrier proportion and identifying the relevant genomic regions. RIDGE relies on an Approximate Bayesian Computation with a model-averaging approach to accommodate diverse scenarios of lineage divergence. It captures heterogeneity in effective migration rate along the genome while accounting for variation in linked selection and recombination. The barrier detection test relies on numerous summary statistics to compute a Bayes factor, offering a robust statistical framework that facilitates cross-species comparisons. Simulations revealed RIDGE's efficiency in capturing signals of ongoing migration. Model averaging proved particularly valuable in scenarios of high model uncertainty where no migration or migration homogeneity can be wrongly assumed, typically for recent divergence times <0.1 2Ne generations. Applying RIDGE to four published crow data sets, we first validated our tool by identifying a well-known large genomic region associated with mate choice patterns. Second, while we identified a significant overlap of outlier loci using RIDGE and traditional genomic scans, our results suggest that a substantial portion of previously identified outliers are likely false positives. Outlier detection relies on allele differentiation, relative measures of divergence and the count of shared polymorphisms and fixed differences. Our analyses also highlight the value of incorporating multiple summary statistics including our newly developed outlier ones that can be useful in challenging detection conditions.
Collapse
|
3
|
Inference of the demographic histories and selective effects of human gut commensal microbiota over the course of human history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566454. [PMID: 38014007 PMCID: PMC10680615 DOI: 10.1101/2023.11.09.566454] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Despite the importance of gut commensal microbiota to human health, there is little knowledge about their evolutionary histories, including their population demographic histories and their distributions of fitness effects (DFE) of new mutations. Here, we infer the demographic histories and DFEs of 27 of the most highly prevalent and abundant commensal gut microbial species in North Americans over timescales exceeding human generations using a collection of lineages inferred from a panel of healthy hosts. We find overall reductions in genetic variation among commensal gut microbes sampled from a Western population relative to an African rural population. Additionally, some species in North American microbiomes display contractions in population size and others expansions, potentially occurring at several key historical moments in human history. DFEs across species vary from highly to mildly deleterious, with accessory genes experiencing more drift compared to core genes. Within genera, DFEs tend to be more congruent, reflective of underlying phylogenetic relationships. Taken together, these findings suggest that human commensal gut microbes have distinct evolutionary histories, possibly reflecting the unique roles of individual members of the microbiome.
Collapse
|
4
|
Integrating Pool-seq uncertainties into demographic inference. Mol Ecol Resour 2023; 23:1737-1755. [PMID: 37475177 DOI: 10.1111/1755-0998.13834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/22/2023]
Abstract
Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modelling Pool-seq sources of error. By jointly modelling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome) and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin) and to infer relevant demographic parameters (e.g. effective sizes and split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e. single origin) and are maintained despite gene flow. These results indicate that demographic modelling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.
Collapse
|
5
|
Environmental and Socio-Cultural Factors Impacting the Unique Gene Pool Pattern of Mae Hong-Son Chicken. Animals (Basel) 2023; 13:1949. [PMID: 37370459 PMCID: PMC10295432 DOI: 10.3390/ani13121949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/08/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Understanding the genetic diversity of domestic chicken breeds under the impact of socio-cultural and ecological dynamics is vital for the conservation of natural resources. Mae Hong Son chicken is a local breed of North Thai domestic chicken widely distributed in Mae Hong Son Province, Thailand; however, its genetic characterization, origin, and diversity remain poorly understood. Here, we studied the socio-cultural, environmental, and genetic aspects of the Mae Hong Son chicken breed and investigated its diversity and allelic gene pool. We genotyped 28 microsatellite markers and analyzed mitochondrial D-loop sequencing data to evaluate genetic diversity and assessed spatial habitat suitability using maximum entropy modeling. Sequence diversity analysis revealed a total of 188 genotyped alleles, with overall nucleotide diversity of 0.014 ± 0.007, indicating that the Mae Hong Son chicken population is genetically highly diverse, with 35 (M1-M35) haplotypes clustered into haplogroups A, B, E, and F, mostly in the North ecotype. Allelic gene pool patterns showed a unique DNA fingerprint of the Mae Hong Son chicken, as compared to other breeds and red junglefowl. A genetic introgression of some parts of the gene pool of red junglefowl and other indigenous breeds was identified in the Mae Hong Son chicken, supporting the hypothesis of the origin of the Mae Hong Son chicken. During domestication in the past 200-300 years after the crossing of indigenous chickens and red junglefowl, the Mae Hong Son chicken has adapted to the highland environment and played a significant socio-cultural role in the Northern Thai community. The unique genetic fingerprint of the Mae Hong Son chicken, retaining a high level of genetic variability that includes a dynamic demographic and domestication history, as well as a range of ecological factors, might reshape the adaptation of this breed under selective pressure.
Collapse
|
6
|
Inferring the demographic history of the North American firefly Photinus pyralis. J Evol Biol 2022; 35:1488-1499. [PMID: 36168726 DOI: 10.1111/jeb.14094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 06/13/2022] [Accepted: 07/11/2022] [Indexed: 11/28/2022]
Abstract
The firefly Photinus pyralis inhabits a wide range of latitudinal and ecological niches, with populations living from temperate to tropical habitats. Despite its broad distribution, its demographic history is unknown. In this study, we modelled and inferred different demographic scenarios for North American populations of P. pyralis, which were collected from Texas to New Jersey. We used a combination of ABC techniques (for multi-population/colonization analyses) and likelihood inference (dadi, StairwayPlot2, PoMo) for single-population demographic inference, which proved useful with our RAD data. We uncovered that the most ancestral North American population lays in Texas, which further colonized the Central region of the US and more recently the North Eastern coast. Our study confidently rejects a demographic scenario where the North Eastern populations colonized more southern populations until reaching Texas. To estimate the age of divergence between of P. pyralis, which provides deeper insights into the history of the entire species, we assembled a multi-locus phylogenetic data covering the genus Photinus. We uncovered that the phylogenetic node leading to P. pyralis lies at the end of the Miocene. Importantly, modelling the demographic history of North American P. pyralis serves as a null model of nucleotide diversity patterns in a widespread native insect species, which will serve in future studies for the detection of adaptation events in this firefly species, as well as a comparison for future studies of other North American insect taxa.
Collapse
|
7
|
Coincidence of low genetic diversity and increasing population size in wild gaur populations in the Khao Phaeng Ma Non-Hunting Area, Thailand: A challenge for conservation management under human-wildlife conflict. PLoS One 2022; 17:e0273731. [PMID: 36040968 PMCID: PMC9426942 DOI: 10.1371/journal.pone.0273731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 08/13/2022] [Indexed: 11/19/2022] Open
Abstract
The gaur (Bos gaurus) is found throughout mainland South and Southeast Asia but is listed as an endangered species in Thailand with a decreasing population size and a reduction in suitable habitat. While gaur have shown a population recovery from 35 to 300 individuals within 30 years in the Khao Phaeng Ma (KPM) Non-Hunting Area, this has caused conflict with villagers along the border of the protected area. At the same time, the ecotourism potential of watching gaurs has boosted the local economy. In this study, 13 mitochondrial displacement-loop sequence samples taken from gaur with GPS collars were analyzed. Three haplotypes identified in the population were defined by only two parsimony informative sites (from 9 mutational steps of nucleotide difference). One haplotype was shared among eleven individuals located in different subpopulations/herds, suggesting very low genetic diversity with few maternal lineages in the founder population. Based on the current small number of sequences, neutrality and demographic expansion test results also showed that the population was likely to contract in the near future. These findings provide insight into the genetic diversity and demography of the wild gaur population in the KPM protected area that can inform long-term sustainable management action plans.
Collapse
|
8
|
Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination. PLoS Comput Biol 2022; 18:e1010422. [PMID: 35984849 PMCID: PMC9447913 DOI: 10.1371/journal.pcbi.1010422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/06/2022] [Accepted: 07/21/2022] [Indexed: 11/19/2022] Open
Abstract
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome. Phylogeographic methods are widely used to reconstruct the historical movement of individuals between different populations. When applied to infectious pathogens, these methods are often used to reconstruct the origin or source of novel pathogen lineages. Most existing phylogeographic methods reconstruct movement based on a single phylogenetic tree, which is assumed to reflect the genetic ancestry of all sampled individuals. However in populations undergoing recombination, genetic material can be exchanged between lineages such that individuals may inherit different regions of their genome from different ancestors. In this case, phylogenetic relationships among individuals can only be captured by a reticulated network rather than any single tree. Ancestral Recombination Graphs (ARGs) provide one way of capturing these reticulate relationships and we develop new models that allow for demographic inference of historical population sizes, recombination rates and migration rates between subpopulations from ARGs. By accounting for recombination, our models not only allow for accurate demographic inference, but can take full advantage of the additional information contained in ARGs about how ancestry varies across genomes to more precisely reconstruct the movement of genetic material between populations.
Collapse
|
9
|
Population divergence time estimation using individual lineage label switching. G3 GENES|GENOMES|GENETICS 2022; 12:6528849. [PMID: 35166790 PMCID: PMC8982400 DOI: 10.1093/g3journal/jkac040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/07/2022] [Indexed: 11/14/2022]
Abstract
Abstract
Divergence time estimation from multilocus genetic data has become common in population genetics and phylogenetics. We present a new Bayesian inference method that treats the divergence time as a random variable. The divergence time is calculated from an assembly of splitting events on individual lineages in a genealogy. The time for such a splitting event is drawn from a hazard function of the truncated normal distribution. This allows easy integration into the standard coalescence framework used in programs such as Migrate. We explore the accuracy of the new inference method with simulated population splittings over a wide range of divergence time values and with a reanalysis of a dataset of 5 populations consisting of 3 present-day populations (Africans, Europeans, Asian) and 2 archaic samples (Altai and Ust’Isthim). Evaluations of simple divergence models without subsequent geneflow show high accuracy, whereas the accuracy of the results of isolation with migration models depends on the magnitude of the immigration rate. High immigration rates lead to a time of the most recent common ancestor of the sample that, looking backward in time, predates the divergence time. Even with many independent loci, accurate estimation of the divergence time with high immigration rates becomes problematic. Our comparison to other software tools reveals that our lineage-switching method, implemented in Migrate, is comparable to IMa2p. The software Migrate can run large numbers of sequence loci (>1,000) on computer clusters in parallel.
Collapse
|
10
|
|
11
|
Abstract
Abstract
Shared phylogenetic breaks often are associated with clear geographic barriers but some common phylogeographic breaks may lack obvious underlying mechanisms. A phylogenetic break involving multiple taxa was found in the Baja California Peninsula that was associated with a past sea barrier. However, geological evidence is lacking for this barrier’s past existence, and despite its current absence, the genetic breaks have persisted. This work explores the relationships between the current climatic niches for matrilineages of 11 vertebrate species as a possible explanation for the current geographic partitioning of matrilineages. We evaluated the climatic occupancy of each matrilineage through ecological niche models, background similarity, niche overlap, niche divergence, and Mantel tests. We found disparities in the climatic occupancy between north and south matrilineage of each taxon. Northern matrilineages are associated with lower temperatures and winter rains, while southern matrilineages reside in areas with higher temperatures and summer rains.
Collapse
|
12
|
Reduced genetic variability in a captive-bred population of the endangered Hume's pheasant (Syrmaticus humiae, Hume 1881) revealed by microsatellite genotyping and D-loop sequencing. PLoS One 2021; 16:e0256573. [PMID: 34449789 PMCID: PMC8396778 DOI: 10.1371/journal.pone.0256573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 08/09/2021] [Indexed: 11/18/2022] Open
Abstract
Captive breeding programs are crucial to ensure the survival of endangered species and ultimately to reintroduce individuals into the wild. However, captive-bred populations can also deteriorate due to inbreeding depression and reduction of genetic variability. We genotyped a captive population of 82 individuals of the endangered Hume's pheasant (Syrmaticus humiae, Hume 1881) at the Doi Tung Wildlife Breeding Center to assess the genetic consequences associated with captive breeding. Analysis of microsatellite loci and mitochondrial D-loop sequences reveal significantly reduced genetic differentiation and a shallow population structure. Despite the low genetic variability, no bottleneck was observed but 12 microsatellite loci were informative in reflecting probable inbreeding. These findings provide a valuable source of knowledge to maximize genetic variability and enhance the success of future conservation plans for captive and wild populations of Hume's pheasant.
Collapse
|
13
|
High-Level Gene Flow Restricts Genetic Differentiation in Dairy Cattle Populations in Thailand: Insights from Large-Scale Mt D-Loop Sequencing. Animals (Basel) 2021; 11:ani11061680. [PMID: 34199963 PMCID: PMC8227385 DOI: 10.3390/ani11061680] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/25/2021] [Accepted: 05/31/2021] [Indexed: 12/11/2022] Open
Abstract
Domestication and artificial selection lead to the development of genetically divergent cattle breeds or hybrids that exhibit specific patterns of genetic diversity and population structure. Recently developed mitochondrial markers have allowed investigation of cattle diversity worldwide; however, an extensive study on the population-level genetic diversity and demography of dairy cattle in Thailand is still needed. Mitochondrial D-loop sequences were obtained from 179 individuals (hybrids of Bos taurus and B. indicus) sampled from nine different provinces. Fifty-one haplotypes, of which most were classified in haplogroup "I", were found across all nine populations. All sampled populations showed severely reduced degrees of genetic differentiation, and low nucleotide diversity was observed in populations from central Thailand. Populations that originated from adjacent geographical areas tended to show high gene flow, as revealed by patterns of weak network structuring. Mismatch distribution analysis was suggestive of a stable population, with the recent occurrence of a slight expansion event. The results provide insights into the origins and the genetic relationships among local Thai cattle breeds and will be useful for guiding management of cattle breeding in Thailand.
Collapse
|
14
|
Abstract
The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the "Two-Two (TT)" and the "Two-Two-outgroup (TTo)" methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.
Collapse
|
15
|
Negative selection on complex traits limits phenotype prediction accuracy between populations. Am J Hum Genet 2021; 108:620-631. [PMID: 33691092 DOI: 10.1016/j.ajhg.2021.02.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 02/17/2021] [Indexed: 12/22/2022] Open
Abstract
Phenotype prediction is a key goal for medical genetics. Unfortunately, most genome-wide association studies are done in European populations, which reduces the accuracy of predictions via polygenic scores in non-European populations. Here, we use population genetic models to show that human demographic history and negative selection on complex traits can result in population-specific genetic architectures. For traits where alleles with the largest effect on the trait are under the strongest negative selection, approximately half of the heritability can be accounted for by variants in Europe that are absent from Africa, leading to poor performance in phenotype prediction across these populations. Further, under such a model, individuals in the tails of the genetic risk distribution may not be identified via polygenic scores generated in another population. We empirically test these predictions by building a model to stratify heritability between European-specific and shared variants and applied it to 37 traits and diseases in the UK Biobank. Across these phenotypes, ∼30% of the heritability comes from European-specific variants. We conclude that genetic association studies need to include more diverse populations to enable the utility of phenotype prediction in all populations.
Collapse
|
16
|
Abstract
A key challenge in understanding how organisms adapt to their environments is to identify the mutations and genes that make it possible. By comparing patterns of sequence variation to neutral predictions across genomes, the targets of positive selection can be located. We applied this logic to house mice that invaded Gough Island (GI), an unusual population that shows phenotypic and ecological hallmarks of selection. We used massively parallel short-read sequencing to survey the genomes of 14 GI mice. We computed a set of summary statistics to capture diverse aspects of variation across these genome sequences, used approximate Bayesian computation to reconstruct a null demographic model, and then applied machine learning to estimate the posterior probability of positive selection in each region of the genome. Using a conservative threshold, 1,463 5-kb windows show strong evidence for positive selection in GI mice but not in a mainland reference population of German mice. Disproportionate shares of these selection windows contain genes that harbor derived nonsynonymous mutations with large frequency differences. Over-represented gene ontologies in selection windows emphasize neurological themes. Inspection of genomic regions harboring many selection windows with high posterior probabilities pointed to genes with known effects on exploratory behavior and body size as potential targets. Some genes in these regions contain candidate adaptive variants, including missense mutations and/or putative regulatory mutations. Our results provide a genomic portrait of adaptation to island conditions and position GI mice as a powerful system for understanding the genetic component of natural selection.
Collapse
|
17
|
DILS: Demographic inferences with linked selection by using ABC. Mol Ecol Resour 2021; 21:2629-2644. [PMID: 33448666 DOI: 10.1111/1755-0998.13323] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 12/09/2020] [Accepted: 12/21/2020] [Indexed: 01/21/2023]
Abstract
We present DILS, a deployable statistical analysis platform for conducting demographic inferences with linked selection from population genomic data using an Approximate Bayesian Computation framework. DILS takes as input single-population or two-population data sets (multilocus fasta sequences) and performs three types of analyses in a hierarchical manner, identifying: (a) the best demographic model to study the importance of gene flow and population size change on the genetic patterns of polymorphism and divergence, (b) the best genomic model to determine whether the effective size Ne and migration rate N, m are heterogeneously distributed along the genome (implying linked selection) and (c) loci in genomic regions most associated with barriers to gene flow. Also available via a Web interface, an objective of DILS is to facilitate collaborative research in speciation genomics. Here, we show the performance and limitations of DILS by using simulations and finally apply the method to published data on a divergence continuum composed by 28 pairs of Mytilus mussel populations/species.
Collapse
|
18
|
A Revised Model of Anatomically Modern Human Expansions Out of Africa through a Machine Learning Approximate Bayesian Computation Approach. Genes (Basel) 2020; 11:genes11121510. [PMID: 33339234 PMCID: PMC7766041 DOI: 10.3390/genes11121510] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 01/25/2023] Open
Abstract
There is a wide consensus in considering Africa as the birthplace of anatomically modern humans (AMH), but the dispersal pattern and the main routes followed by our ancestors to colonize the world are still matters of debate. It is still an open question whether AMH left Africa through a single process, dispersing almost simultaneously over Asia and Europe, or in two main waves, first through the Arab Peninsula into southern Asia and Australo-Melanesia, and later through a northern route crossing the Levant. The development of new methodologies for inferring population history and the availability of worldwide high-coverage whole-genome sequences did not resolve this debate. In this work, we test the two main out-of-Africa hypotheses through an Approximate Bayesian Computation approach, based on the Random-Forest algorithm. We evaluated the ability of the method to discriminate between the alternative models of AMH out-of-Africa, using simulated data. Once assessed that the models are distinguishable, we compared simulated data with real genomic variation, from modern and archaic populations. This analysis showed that a model of multiple dispersals is four-fold as likely as the alternative single-dispersal model. According to our estimates, the two dispersal processes may be placed, respectively, around 74,000 and around 46,000 years ago.
Collapse
|
19
|
Genetic management of a water monitor lizard (
Varanus salvator macromaculatus
) population at Bang Kachao Peninsula as a consequence of urbanization with Varanus Farm Kamphaeng Saen as the first captive research establishment. J ZOOL SYST EVOL RES 2020. [DOI: 10.1111/jzs.12436] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Distinguishing among complex evolutionary models using unphased whole-genome data through random forest approximate Bayesian computation. Mol Ecol Resour 2020; 21:2614-2628. [PMID: 33000507 DOI: 10.1111/1755-0998.13263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 08/28/2020] [Accepted: 09/07/2020] [Indexed: 01/25/2023]
Abstract
Inferring past demographic histories is crucial in population genetics, and the amount of complete genomes now available should in principle facilitate this inference. In practice, however, the available inferential methods suffer from severe limitations. Although hundreds complete genomes can be simultaneously analysed, complex demographic processes can easily exceed computational constraints, and the procedures to evaluate the reliability of the estimates contribute to increase the computational effort. Here we present an approximate Bayesian computation framework based on the random forest algorithm (ABC-RF), to infer complex past population processes using complete genomes. To this aim, we propose to summarize the data by the full genomic distribution of the four mutually exclusive categories of segregating sites (FDSS), a statistic fast to compute from unphased genome data and that does not require the ancestral state of alleles to be known. We constructed an efficient ABC pipeline and tested how accurately it allows one to recognize the true model among models of increasing complexity, using simulated data and taking into account different sampling strategies in terms of number of individuals analysed, number and size of the genetic loci considered. We also compared the FDSS with the unfolded and folded site frequency spectrum (SFS), and for these statistics we highlighted the experimental conditions maximizing the inferential power of the ABC-RF procedure. We finally analysed real data sets, testing models on the dispersal of anatomically modern humans out of Africa and exploring the evolutionary relationships of the three species of Orangutan inhabiting Borneo and Sumatra.
Collapse
|
21
|
Versatile simulations of admixture and accurate local ancestry inference with mixnmatch and ancestryinfer. Mol Ecol Resour 2020; 20:1141-1151. [PMID: 32324964 PMCID: PMC7384932 DOI: 10.1111/1755-0998.13175] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 03/09/2020] [Accepted: 04/15/2020] [Indexed: 12/13/2022]
Abstract
It has become clear that hybridization between species is much more common than previously recognized. As a result, we now know that the genomes of many modern species, including our own, are a patchwork of regions derived from past hybridization events. Increasingly researchers are interested in disentangling which regions of the genome originated from each parental species using local ancestry inference methods. Due to the diverse effects of admixture, this interest is shared across disparate fields, from human genetics to research in ecology and evolutionary biology. However, local ancestry inference methods are sensitive to a range of biological and technical parameters which can impact accuracy. Here we present paired simulation and ancestry inference pipelines, mixnmatch and ancestryinfer, to help researchers plan and execute local ancestry inference studies. mixnmatch can simulate arbitrarily complex demographic histories in the parental and hybrid populations, selection on hybrids, and technical variables such as coverage and contamination. ancestryinfer takes as input sequencing reads from simulated or real individuals, and implements an efficient local ancestry inference pipeline. We perform a series of simulations with mixnmatch to pinpoint factors that influence accuracy in local ancestry inference and highlight useful features of the two pipelines. mixnmatch is a powerful tool for simulations of hybridization while ancestryinfer facilitates local ancestry inference on real or simulated data.
Collapse
|
22
|
Polymorphism Data Assist Estimation of the Nonsynonymous over Synonymous Fixation Rate Ratio ω for Closely Related Species. Mol Biol Evol 2020; 37:260-279. [PMID: 31504782 PMCID: PMC6984366 DOI: 10.1093/molbev/msz203] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The ratio of nonsynonymous over synonymous sequence divergence, dN/dS, is a widely used estimate of the nonsynonymous over synonymous fixation rate ratio ω, which measures the extent to which natural selection modulates protein sequence evolution. Its computation is based on a phylogenetic approach and computes sequence divergence of protein-coding DNA between species, traditionally using a single representative DNA sequence per species. This approach ignores the presence of polymorphisms and relies on the indirect assumption that new mutations fix instantaneously, an assumption which is generally violated and reasonable only for distantly related species. The violation of the underlying assumption leads to a time-dependence of sequence divergence, and biased estimates of ω in particular for closely related species, where the contribution of ancestral and lineage-specific polymorphisms to sequence divergence is substantial. We here use a time-dependent Poisson random field model to derive an analytical expression of dN/dS as a function of divergence time and sample size. We then extend our framework to the estimation of the proportion of adaptive protein evolution α. This mathematical treatment enables us to show that the joint usage of polymorphism and divergence data can assist the inference of selection for closely related species. Moreover, our analytical results provide the basis for a protocol for the estimation of ω and α for closely related species. We illustrate the performance of this protocol by studying a population data set of four corvid species, which involves the estimation of ω and α at different time-scales and for several choices of sample sizes.
Collapse
|
23
|
Subsets of NLR genes show differential signatures of adaptation during colonization of new habitats. THE NEW PHYTOLOGIST 2019; 224:367-379. [PMID: 31230368 DOI: 10.1111/nph.16017] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 06/14/2019] [Indexed: 06/09/2023]
Abstract
Nucleotide binding site, leucine-rich repeat receptors (NLRs) are canonical resistance (R) genes in plants, fungi and animals, functioning as central (helper) and peripheral (sensor) genes in a signalling network. We investigate NLR evolution during the colonization of novel habitats in a model tomato species, Solanum chilense. We used R-gene enrichment sequencing to obtain polymorphism data at NLRs of 140 plants sampled across 14 populations covering the whole species range. We inferred the past demographic history of habitat colonization by resequencing whole genomes from three S. chilense plants from three key populations and performing approximate Bayesian computation using data from the 14 populations. Using these parameters, we simulated the genetic differentiation statistics distribution expected under neutral NLR evolution and identified small subsets of outlier NLRs exhibiting signatures of selection across populations. NLRs under selection between habitats are more often helper genes, whereas those showing signatures of adaptation in single populations are more often sensor-NLRs. Thus, centrality in the NLR network does not constrain NLR evolvability, and new mutations in central genes in the network are key for R-gene adaptation during colonization of different habitats.
Collapse
|
24
|
Complex population evolutionary history of four cold-tolerant Notopterygium herb species in the Qinghai-Tibetan Plateau and adjacent areas. Heredity (Edinb) 2019; 123:242-263. [PMID: 30742051 PMCID: PMC6781143 DOI: 10.1038/s41437-019-0186-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Revised: 01/20/2019] [Accepted: 01/22/2019] [Indexed: 01/24/2023] Open
Abstract
Historical geological and climatic events are the most important drivers of population expansions/contractions, range shifts, and interspecific divergence in plants. However, the species divergence and spatiotemporal population dynamics of alpine cold-tolerant herbal plants in the high-altitude Qinghai-Tibetan Plateau (QTP) and adjacent areas remain poorly understood. In this study, we investigated population evolutionary history of four endangered Notopterygium herb species in the QTP and adjacent regions. We sequenced 10 nuclear loci, 2 mitochondrial DNA regions, and 4 chloroplast DNA regions in a total of 72 natural populations from the 4 species, and tested the hypothesis that the population history of these alpine herbs was markedly affected by the Miocene-Pliocene QTP uplifts and Quaternary climatic oscillations. We found that the four Notopterygium species had generally low levels of nucleotide variability within populations. Molecular dating and isolation-with-migration analyses suggested that Notopterygium species diverged ~1.74-7.82 million years ago and their differentiation was significantly associated with recent uplifts of the eastern margin of the QTP. In addition, ecological niche modeling and population history analysis showed that N. incisum and N. franchetii underwent considerable demographic expansions during the last glacial period of the Pleistocene, whereas a demographic contraction and a expansion occurred for N. forrestii and N. oviforme during the antepenultimate interglacial period and penultimate glacial period, respectively. These findings highlight the importance of geological and climatic changes during the Miocene-Pliocene and Pleistocene as causes of species divergence and changes in population structure within cold-tolerant herbs in the QTP biodiversity hotspot.
Collapse
|
25
|
Efficiently inferring the demographic history of many populations with allele count data. J Am Stat Assoc 2019; 115:1472-1487. [PMID: 33012903 PMCID: PMC7531012 DOI: 10.1080/01621459.2019.1635482] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 04/14/2019] [Accepted: 06/08/2019] [Indexed: 01/06/2023]
Abstract
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed "basal Eurasian" admixture event in human history. We implement and release our method in a new open-source software package momi2.
Collapse
|
26
|
Local persistence of Mann's soft-haired mouse Abrothrix manni (Rodentia, Sigmodontinae) during Quaternary glaciations in southern Chile. PeerJ 2018; 6:e6130. [PMID: 30588409 PMCID: PMC6302793 DOI: 10.7717/peerj.6130] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 11/17/2018] [Indexed: 12/05/2022] Open
Abstract
Quaternary climatic oscillations have impacted Patagonian sigmodontine fauna, leaving traceable genetic footprints. In southern Chile, changes in the landscape included transitions to different vegetation formations as well as the extension of ice sheets. In this study, we focus on the Valdivian forest endemic and recently described sigmodontine species Abrothrix manni. We aim to assess the genetic structure of this species, testing for the existence of intraspecific lineages, and inferring the recent demographic history of the species. Analyses were based on the first 801 bp of the mitochondrial gene Cytocrhome-b from 49 individuals of A. manni collected at 10 localities that covers most part of its geographic distribution. Genealogical analyses recovered two main intraspecific lineages that are geographically segregated and present an intermediate site of secondary contact. Historical demography shows signal of recent population decrease. Based on these results, we proposed that current genetic diversity of A. manni differentiated in at least two distinct refugial areas in southern Chile. This scenario, in addition to be unique among those uncovered for the so far studied Valdivian forest rodents, is noteworthy because of the reduced geographic scale inhabited by the species.
Collapse
|
27
|
The Wright-Fisher site frequency spectrum as a perturbation of the coalescent's. Theor Popul Biol 2018; 124:81-92. [PMID: 30308178 DOI: 10.1016/j.tpb.2018.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/22/2018] [Accepted: 09/28/2018] [Indexed: 11/24/2022]
Abstract
The first terms of the Wright-Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size n to be equal to the population size N. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is 0.12∕logN for a population sized sample with n=N, which is only 1% for N=2×104. The coalescent remains a good approximation for the site frequency spectrum of-large samples.
Collapse
|
28
|
Evaluating genomic signatures of "the large X-effect" during complex speciation. Mol Ecol 2018; 27:3822-3830. [PMID: 29940087 PMCID: PMC6705125 DOI: 10.1111/mec.14777] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Revised: 05/31/2018] [Accepted: 06/07/2018] [Indexed: 12/16/2022]
Abstract
The ubiquity of the "two rules of speciation"-Haldane's rule and the large X-effect-implies a general, special role for sex chromosomes in the evolution of intrinsic postzygotic reproductive isolation. The recent proliferation of genome-scale analyses has revealed two further general observations: (a) complex speciation involving some form of gene flow is not uncommon, and (b) sex chromosomes in male- and in female-heterogametic taxa tend to show elevated differentiation relative to autosomes. Together, these observations are consistent with speciation histories in which population genetic differentiation at autosomal loci is reduced by gene flow while natural selection against hybrid incompatibilities renders sex chromosomes relatively refractory to gene flow. Here, I summarize multilocus population genetic and population genomic evidence for greater differentiation on the X (or Z) vs. the autosomes and consider the possible causes. I review common population genetic circumstances involving no selection and/or no interspecific gene flow that are nevertheless expected to elevate differentiation on sex chromosomes relative to autosomes. I then review theory for why large X-effects exist for hybrid incompatibilities and, more generally, for loci mediating local adaptation. The observed levels of sex chromosome vs. autosomal differentiation, in many cases, appear consistent with simple explanations requiring neither large X-effects nor gene flow. Discerning signatures of large X-effects during complex speciation will therefore require analyses that go beyond chromosome-scale summaries of population genetic differentiation, explicitly test for differential introgression, and/or integrate experimental genetic data.
Collapse
|
29
|
Evidence for Introgression Among Three Species of the Anastrepha fraterculus Group, a Radiating Species Complex of Fruit Flies. Front Genet 2018; 9:359. [PMID: 30250479 PMCID: PMC6139333 DOI: 10.3389/fgene.2018.00359] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Accepted: 08/21/2018] [Indexed: 12/13/2022] Open
Abstract
Introgression should no longer be considered as rare a phenomenon as once thought, since several studies have recently documented gene flow between closely related and radiating species. Here, we investigated evolutionary relationships among three closely related species of fruit flies of the Anastrepha fraterculus group (Anastrepha fraterculus, A. obliqua and A. sororcula). We sequenced a set of 20 genes and implemented a combined populational and phylogenetic inference with a model selection approach by an ABC framework in order to elucidate the demographic history of these species. The phylogenetic histories inferred from most genes showed a great deal of discordance and substantial shared polymorphic variation. The analysis of several population and speciation models reveal that this shared variation is better explained by introgression rather than convergence by parallel mutation or incomplete lineage sorting. Our results consistently showed these species evolving under an isolation with migration model experiencing a continuous and asymmetrical pattern of gene flow involving all species pairs, even though still showed a more closely related relationship between A. fraterculus and A. sororcula when compared with A. obliqua. This suggests that these species have been exchanging genes since they split from their common ancestor ∼2.6 MYA ago. We also found strong evidence for recent population expansion that appears to be consequence of anthropic activities affecting host crops of fruit flies. These findings point that the introgression here found may have been driven by genetic drift and not necessary by selection, which has implications for tracking and managing fruit flies.
Collapse
|
30
|
Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference. Genetics 2018; 210:665-682. [PMID: 30064984 DOI: 10.1534/genetics.118.300733] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 07/30/2018] [Indexed: 11/18/2022] Open
Abstract
The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to zero or diverge to infinity, and show undesirable sensitivity to perturbations in the data. The goal of this article is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographies and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model, and generalize our intuition to arbitrary sample sizes using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only [Formula: see text] epochs, where [Formula: see text] is between [Formula: see text] and [Formula: see text] The set of expected SFS for piecewise-constant demographies with fewer than [Formula: see text] epochs is open and nonconvex, which causes the above phenomena for inference from data.
Collapse
|
31
|
The divergence history of European blue mussel species reconstructed from Approximate Bayesian Computation: the effects of sequencing techniques and sampling strategies. PeerJ 2018; 6:e5198. [PMID: 30083438 PMCID: PMC6071616 DOI: 10.7717/peerj.5198] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 06/19/2018] [Indexed: 01/25/2023] Open
Abstract
Genome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the jSFS, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e., periodic connectivity) and across genes (i.e., genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding jSFS, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.
Collapse
|
32
|
Abstract
The increasing availability of population-level allele frequency data across one or more related populations necessitates the development of methods that can efficiently estimate population genetics parameters, such as the strength of selection acting on the population(s), from such data. Existing methods for this problem in the setting of the Wright-Fisher diffusion model are primarily likelihood-based, and rely on numerical approximation for likelihood computation and on bootstrapping for assessment of variability in the resulting estimates, requiring extensive computation. Recent work has provided a method for obtaining exact samples from general Wright-Fisher diffusion processes, enabling the development of methods for Bayesian estimation in this setting. We develop and implement a Bayesian method for estimating the strength of selection based on the Wright-Fisher diffusion for data sampled at a single time point. The method utilizes the latest algorithms for exact sampling to devise a Markov chain Monte Carlo procedure to draw samples from the joint posterior distribution of the selection coefficient and the allele frequencies. We demonstrate that when assumptions about the initial allele frequencies are accurate the method performs well for both simulated data and for an empirical data set on hypoxia in flies, where we find evidence for strong positive selection in a region of chromosome 2L previously identified. We discuss possible extensions of our method to the more general settings commonly encountered in practice, highlighting the advantages of Bayesian approaches to inference in this setting.
Collapse
|
33
|
Conservation Genetics of the Cheetah: Genetic History and Implications for Conservation. CHEETAHS: BIOLOGY AND CONSERVATION 2018. [PMCID: PMC7149701 DOI: 10.1016/b978-0-12-804088-1.00006-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
From allozymes in 1983 to whole genomes in 2015, genetic studies of the cheetah have been extensive. In this chapter we provide an overview of the available literature. Overall, patterns of genetic variation provided evidence of low variability and suggest this loss occurred thousands of years ago. Differences between published subspecies were supported genetically. At a local scale, populations were generally considered panmictic with minor genetic structure. Although cheetahs have persisted despite low genetic variability, important questions arise from these findings: Does the cheetah have the ability to adapt to and evolve with future changes in environmental and infectious pressure? How would cheetahs cope with further loss of genetic diversity? Connectivity in the wild should be maintained via prevention of habitat loss, while management of small isolated populations may require reestablishing gene flow. Genetics could assist captive-breeding decisions and provide forensic evidence as to the geographical origin of illegally traded animals.
Collapse
|
34
|
Phylogeography, Population Structure, and Conservation of the Javan Gibbon (Hylobates moloch). INT J PRIMATOL 2017. [DOI: 10.1007/s10764-017-0005-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
35
|
Porcine Y-chromosome variation is consistent with the occurrence of paternal gene flow from non-Asian to Asian populations. Heredity (Edinb) 2017; 120:63-76. [PMID: 29234173 DOI: 10.1038/s41437-017-0002-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Accepted: 06/21/2017] [Indexed: 11/09/2022] Open
Abstract
Pigs (Sus scrofa) originated in Southeast Asia and expanded to Europe and North Africa approximately 1 MYA. Analyses of porcine Y-chromosome variation have shown the existence of two main haplogroups that are highly divergent, a result that is consistent with previous mitochondrial and autosomal data showing that the Asian and non-Asian pig populations remained geographically isolated until recently. Paradoxically, one of these Y-chromosome haplogroups is extensively shared by pigs and wild boars from Asia and Europe, an observation that is difficult to reconcile with a scenario of prolonged geographic isolation. To shed light on this issue, we genotyped 33 Y-linked SNPs and one indel in a worldwide sample of pigs and wild boars and sequenced a total of 9903 nucleotide sites from seven loci distributed along the Y-chromosome. Notably, the nucleotide diversity per site at the Y-linked loci (0.0015 in Asian pigs) displayed the same order of magnitude as that described for autosomal loci (~0.0023), a finding compatible with a process of sustained and intense isolation. We performed an approximate Bayesian computation analysis focused on the paternal diversity of wild boars and local pig breeds in which we compared three demographic models: two isolation models (I models) differing in the time of isolation and a model of isolation with recent unidirectional migration (IM model). Our results suggest that the most likely explanation for the extensive sharing of one Y-chromosome haplogroup between non-Asian and Asian populations is a recent and unidirectional (non-Asian > Asian) paternal migration event.
Collapse
|
36
|
multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes. Mol Ecol Resour 2017; 17:e212-e224. [PMID: 28449263 PMCID: PMC5724483 DOI: 10.1111/1755-0998.12686] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/14/2017] [Accepted: 04/14/2017] [Indexed: 01/25/2023]
Abstract
Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics.
Collapse
|
37
|
Exact Calculation of the Joint Allele Frequency Spectrum for Isolation with Migration Models. Genetics 2017; 207:241-253. [PMID: 28696217 PMCID: PMC5586375 DOI: 10.1534/genetics.116.194019] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 06/30/2017] [Indexed: 12/26/2022] Open
Abstract
Population genomic datasets collected over the past decade have spurred interest in developing methods that can utilize massive numbers of loci for inference of demographic and selective histories of populations. The allele frequency spectrum (AFS) provides a convenient statistic for such analysis, and, accordingly, much attention has been paid to predicting theoretical expectations of the AFS under a number of different models. However, to date, exact solutions for the joint AFS of two or more populations under models of migration and divergence have not been found. Here, we present a novel Markov chain representation of the coalescent on the state space of the joint AFS that allows for rapid, exact calculation of the joint AFS under isolation with migration (IM) models. In turn, we show how our Markov chain method, in the context of composite likelihood estimation, can be used for accurate inference of parameters of the IM model using SNP data. Lastly, we apply our method to recent whole genome datasets from African Drosophila melanogaster.
Collapse
|
38
|
INSIGHT INTO SPECIATION FROM HISTORICAL DEMOGRAPHY IN THE PHYTOPHAGOUS BEETLE GENUS
OPHRAELLA. Evolution 2017; 53:1846-1856. [DOI: 10.1111/j.1558-5646.1999.tb04567.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/1999] [Accepted: 05/21/1999] [Indexed: 11/27/2022]
|
39
|
The emergence of the hyperinvasive vine, Mikania micrantha (Asteraceae), via admixture and founder events inferred from population transcriptomics. Mol Ecol 2017; 26:3405-3423. [PMID: 28370790 DOI: 10.1111/mec.14124] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 03/20/2017] [Accepted: 03/21/2017] [Indexed: 01/14/2023]
Abstract
Biological invasions that involve well-documented rapid adaptations to new environments provide unequalled opportunities for testing evolutionary hypotheses. Mikania micrantha Kunth (Asteraceae), a perennial herbaceous vine native to tropical Central and South America, successfully invaded tropical Asia in the early 20th century. It is regarded as one of the most aggressive weeds in the world. To elucidate the molecular and evolutionary processes underlying this invasion, we extensively sampled this weed throughout its invaded range in South-East and South Asia and surveyed its genetic structure using variants detected from population transcriptomics. Clustering results suggest that more than one source population contributed to this invasion. Computer simulations using genomewide genetic variation support a scenario of admixture and founder events during invasion. The genes differentially expressed between native and invasive populations were found to be involved in oxidative and high light intensity stress responses, pointing to a possible ecological mechanism of adaptation. Our results provide a foundation for further detailed mechanistic and population studies of this ecologically and economically important invasion. This line of research promises to provide new mitigation strategies for invasive species as well as insights into mechanisms of adaptation.
Collapse
|
40
|
Inference of Gene Flow in the Process of Speciation: An Efficient Maximum-Likelihood Method for the Isolation-with-Initial-Migration Model. Genetics 2017; 205:1597-1618. [PMID: 28193727 PMCID: PMC5378116 DOI: 10.1534/genetics.116.188060] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 01/25/2017] [Indexed: 12/03/2022] Open
Abstract
The isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, it has been reported that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions-including the assumption of constant gene flow until the present. This article is concerned with the isolation-with-initial-migration (IIM) model, which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used, by means of likelihood-ratio tests, to distinguish between alternative models representing the following divergence scenarios: (a) divergence with potentially asymmetric gene flow until the present, (b) divergence with potentially asymmetric gene flow until some point in the past and in isolation since then, and (c) divergence in complete isolation. We illustrate the procedure on pairs of Drosophila sequences from ∼30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this article.
Collapse
|
41
|
Abstract
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
Collapse
|
42
|
Shedding Light on the Grey Zone of Speciation along a Continuum of Genomic Divergence. PLoS Biol 2016; 14:e2000234. [PMID: 28027292 PMCID: PMC5189939 DOI: 10.1371/journal.pbio.2000234] [Citation(s) in RCA: 244] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 11/21/2016] [Indexed: 12/24/2022] Open
Abstract
Speciation results from the progressive accumulation of mutations that decrease the probability of mating between parental populations or reduce the fitness of hybrids—the so-called species barriers. The speciation genomic literature, however, is mainly a collection of case studies, each with its own approach and specificities, such that a global view of the gradual process of evolution from one to two species is currently lacking. Of primary importance is the prevalence of gene flow between diverging entities, which is central in most species concepts and has been widely discussed in recent years. Here, we explore the continuum of speciation thanks to a comparative analysis of genomic data from 61 pairs of populations/species of animals with variable levels of divergence. Gene flow between diverging gene pools is assessed under an approximate Bayesian computation (ABC) framework. We show that the intermediate "grey zone" of speciation, in which taxonomy is often controversial, spans from 0.5% to 2% of net synonymous divergence, irrespective of species life history traits or ecology. Thanks to appropriate modeling of among-locus variation in genetic drift and introgression rate, we clarify the status of the majority of ambiguous cases and uncover a number of cryptic species. Our analysis also reveals the high incidence in animals of semi-isolated species (when some but not all loci are affected by barriers to gene flow) and highlights the intrinsic difficulty, both statistical and conceptual, of delineating species in the grey zone of speciation. Isolated populations accumulate genetic differences across their genomes as they diverge, whereas gene flow between populations counteracts divergence and tends to restore genetic homogeneity. Speciation proceeds by the accumulation at specific loci of mutations that reduce the fitness of hybrids, therefore preventing gene flow—the so-called species barriers. Importantly, species barriers are expected to act locally within the genome, leading to the prediction of a mosaic pattern of genetic differentiation between populations at intermediate levels of divergence—the genic view of speciation. At the same time, linked selection also contributes to speed up differentiation in low-recombining and gene-dense regions. We used a modelling approach that accounts for both sources of genomic heterogeneity and explored a wide continuum of genomic divergence made by 61 pairs of species/populations in animals. Our analysis provides a unifying picture of the relationship between molecular divergence and ability to exchange genes. We show that the "grey zone" of speciation—the intermediate state in which species definition is controversial—spans from 0.5% to 2% of molecular divergence, with these thresholds being independent of species life history traits and ecology. Semi-isolated species, between which alleles can be exchanged at some but not all loci, are numerous, with the earliest species barriers being detected at divergences as low as 0.075%. These results have important implications regarding taxonomy, conservation biology, and the management of biodiversity.
Collapse
|
43
|
A Cost-Effective Approach to Sequence Hundreds of Complete Mitochondrial Genomes. PLoS One 2016; 11:e0160958. [PMID: 27505419 PMCID: PMC4978415 DOI: 10.1371/journal.pone.0160958] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 07/27/2016] [Indexed: 12/11/2022] Open
Abstract
We present a cost-effective approach to sequence whole mitochondrial genomes for hundreds of individuals. Our approach uses small reaction volumes and unmodified (non-phosphorylated) barcoded adaptors to minimize reagent costs. We demonstrate our approach by sequencing 383 Fundulus sp. mitochondrial genomes (192 F. heteroclitus and 191 F. majalis). Prior to sequencing, we amplified the mitochondrial genomes using 4–5 custom-made, overlapping primer pairs, and sequencing was performed on an Illumina HiSeq 2500 platform. After removing low quality and short sequences, 2.9 million and 2.8 million reads were generated for F. heteroclitus and F. majalis respectively. Individual genomes were assembled for each species by mapping barcoded reads to a reference genome. For F. majalis, the reference genome was built de novo. On average, individual consensus sequences had high coverage: 61-fold for F. heteroclitus and 57-fold for F. majalis. The approach discussed in this paper is optimized for sequencing mitochondrial genomes on an Illumina platform. However, with the proper modifications, this approach could be easily applied to other small genomes and sequencing platforms.
Collapse
|
44
|
The non-equilibrium allele frequency spectrum in a Poisson random field framework. Theor Popul Biol 2016; 111:51-64. [PMID: 27378747 DOI: 10.1016/j.tpb.2016.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 06/21/2016] [Accepted: 06/23/2016] [Indexed: 11/25/2022]
Abstract
In population genetic studies, the allele frequency spectrum (AFS) efficiently summarizes genome-wide polymorphism data and shapes a variety of allele frequency-based summary statistics. While existing theory typically features equilibrium conditions, emerging methodology requires an analytical understanding of the build-up of the allele frequencies over time. In this work, we use the framework of Poisson random fields to derive new representations of the non-equilibrium AFS for the case of a Wright-Fisher population model with selection. In our approach, the AFS is a scaling-limit of the expectation of a Poisson stochastic integral and the representation of the non-equilibrium AFS arises in terms of a fixation time probability distribution. The known duality between the Wright-Fisher diffusion process and a birth and death process generalizing Kingman's coalescent yields an additional representation. The results carry over to the setting of a random sample drawn from the population and provide the non-equilibrium behavior of sample statistics. Our findings are consistent with and extend a previous approach where the non-equilibrium AFS solves a partial differential forward equation with a non-traditional boundary condition. Moreover, we provide a bridge to previous coalescent-based work, and hence tie several frameworks together. Since frequency-based summary statistics are widely used in population genetics, for example, to identify candidate loci of adaptive evolution, to infer the demographic history of a population, or to improve our understanding of the underlying mechanics of speciation events, the presented results are potentially useful for a broad range of topics.
Collapse
|
45
|
A genomic perspective on hybridization and speciation. Mol Ecol 2016; 25:2337-60. [PMID: 26836441 PMCID: PMC4915564 DOI: 10.1111/mec.13557] [Citation(s) in RCA: 292] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/25/2016] [Indexed: 12/13/2022]
Abstract
Hybridization among diverging lineages is common in nature. Genomic data provide a special opportunity to characterize the history of hybridization and the genetic basis of speciation. We review existing methods and empirical studies to identify recent advances in the genomics of hybridization, as well as issues that need to be addressed. Notable progress has been made in the development of methods for detecting hybridization and inferring individual ancestries. However, few approaches reconstruct the magnitude and timing of gene flow, estimate the fitness of hybrids or incorporate knowledge of recombination rate. Empirical studies indicate that the genomic consequences of hybridization are complex, including a highly heterogeneous landscape of differentiation. Inferred characteristics of hybridization differ substantially among species groups. Loci showing unusual patterns - which may contribute to reproductive barriers - are usually scattered throughout the genome, with potential enrichment in sex chromosomes and regions of reduced recombination. We caution against the growing trend of interpreting genomic variation in summary statistics across genomes as evidence of differential gene flow. We argue that converting genomic patterns into useful inferences about hybridization will ultimately require models and methods that directly incorporate key ingredients of speciation, including the dynamic nature of gene flow, selection acting in hybrid populations and recombination rate variation.
Collapse
|
46
|
Community trees: Identifying codiversification in the Páramo dipteran community. Evolution 2016; 70:1080-93. [PMID: 27061575 DOI: 10.1111/evo.12916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 03/10/2016] [Accepted: 03/15/2016] [Indexed: 01/23/2023]
Abstract
Groups of codistributed species that responded in a concerted manner to environmental events are expected to share patterns of evolutionary diversification. However, the identification of such groups has largely been based on qualitative, post hoc analyses. We develop here two methods (posterior predictive simulation [PPS], Kuhner-Felsenstein [K-F] analysis of variance [ANOVA]) for the analysis of codistributed species that, given a group of species with a shared pattern of diversification, allow empiricists to identify those taxa that do not codiversify (i.e., "outlier" species). The identification of outlier species makes it possible to jointly estimate the evolutionary history of co-diversifying taxa. To evaluate the approaches presented here, we collected data from Páramo dipterans, identified outlier species, and estimated a "community tree" from species that are identified as having codiversified. Our results demonstrate that dipteran communities from different Páramo habitats in the same mountain range are more closely related than communities in other ranges. We also conduct simulation testing to evaluate this approach. Results suggest that our approach provides a useful addition to comparative phylogeographic methods, while identifying aspects of the analysis that require careful interpretation. In particular, both the PPS and K-F ANOVA perform acceptably when there are one or two outlier species, but less so as the number of outliers increases. This is likely a function of the corresponding degradation of the signal of community divergence; without a strong signal from a codiversifying community, there is no dominant pattern from which to detect an outlier species. For this reason, both the magnitude of K-F distance distribution and outside knowledge about the phylogeographic history of each putative member of the community should be considered when interpreting the results.
Collapse
|
47
|
Genomic signature of successful colonization of Eurasia by the allopolyploid shepherd's purse (Capsella bursa-pastoris). Mol Ecol 2016; 25:616-29. [PMID: 26607306 DOI: 10.1111/mec.13491] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 11/18/2015] [Accepted: 11/19/2015] [Indexed: 12/29/2022]
Abstract
Polyploidization is a dominant feature of flowering plant evolution. However, detailed genomic analyses of the interpopulation diversification of polyploids following genome duplication are still in their infancy, mainly because of methodological limits, both in terms of sequencing and computational analyses. The shepherd's purse (Capsella bursa-pastoris) is one of the most common weed species in the world. It is highly self-fertilizing, and recent genomic data indicate that it is an allopolyploid, resulting from hybridization between the ancestors of the diploid species Capsella grandiflora and Capsella orientalis. Here, we investigated the genomic diversity of C. bursa-pastoris, its population structure and demographic history, following allopolyploidization in Eurasia. To that end, we genotyped 261 C. bursa-pastoris accessions spread across Europe, the Middle East and Asia, using genotyping-by-sequencing, leading to a total of 4274 SNPs after quality control. Bayesian clustering analyses revealed three distinct genetic clusters in Eurasia: one cluster grouping samples from Western Europe and Southeastern Siberia, the second one centred on Eastern Asia and the third one in the Middle East. Approximate Bayesian computation (ABC) supported the hypothesis that C. bursa-pastoris underwent a typical colonization history involving low gene flow among colonizing populations, likely starting from the Middle East towards Europe and followed by successive human-mediated expansions into Eastern Asia. Altogether, these findings bring new insights into the recent multistage colonization history of the allotetraploid C. bursa-pastoris and highlight ABC and genotyping-by-sequencing data as promising but still challenging tools to infer demographic histories of selfing allopolyploids.
Collapse
|
48
|
Wing patterning genes and coevolution of Müllerian mimicry inHeliconiusbutterflies: Support from phylogeography, cophylogeny, and divergence times. Evolution 2015; 69:3082-96. [DOI: 10.1111/evo.12812] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Revised: 10/09/2015] [Accepted: 10/26/2015] [Indexed: 11/30/2022]
|
49
|
The aggregate site frequency spectrum for comparative population genomic inference. Mol Ecol 2015; 24:6223-40. [PMID: 26769405 PMCID: PMC4717917 DOI: 10.1111/mec.13447] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 10/26/2015] [Accepted: 10/28/2015] [Indexed: 12/11/2022]
Abstract
Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference.
Collapse
|
50
|
Inference of Super-exponential Human Population Growth via Efficient Computation of the Site Frequency Spectrum for Generalized Models. Genetics 2015; 202:235-45. [PMID: 26450922 PMCID: PMC4701087 DOI: 10.1534/genetics.115.180570] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 09/28/2015] [Indexed: 01/08/2023] Open
Abstract
The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetic studies. Previous studies have shown that human populations have undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave an excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such generalized models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that adequate sample sizes facilitate accurate inference; e.g., a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by ≥10% from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (P-value =3.85×10−6). The estimated growth speed significantly deviates from exponential (P-value ≪10−12), with the best-fit estimate being of growth speed 12% faster than exponential.
Collapse
|