1
|
Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories. Mol Biol Evol 2024; 41:msae073. [PMID: 38630635 PMCID: PMC11068272 DOI: 10.1093/molbev/msae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/16/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.
Collapse
|
2
|
Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
|
3
|
Diversification processes in Gerp's mouse lemur demonstrate the importance of rivers and altitude as biogeographic barriers in Madagascar's humid rainforests. Ecol Evol 2023; 13:e10254. [PMID: 37408627 PMCID: PMC10318617 DOI: 10.1002/ece3.10254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/23/2023] [Accepted: 06/21/2023] [Indexed: 07/07/2023] Open
Abstract
Madagascar exhibits exceptionally high levels of biodiversity and endemism. Models to explain the diversification and distribution of species in Madagascar stress the importance of historical variability in climate conditions which may have led to the formation of geographic barriers by changing water and habitat availability. The relative importance of these models for the diversification of the various forest-adapted taxa of Madagascar has yet to be understood. Here, we reconstructed the phylogeographic history of Gerp's mouse lemur (Microcebus gerpi) to identify relevant mechanisms and drivers of diversification in Madagascar's humid rainforests. We used restriction site associated DNA (RAD) markers and applied population genomic and coalescent-based techniques to estimate genetic diversity, population structure, gene flow and divergence times among M. gerpi populations and its two sister species M. jollyae and M. marohita. Genomic results were complemented with ecological niche models to better understand the relative barrier function of rivers and altitude. We show that M. gerpi diversified during the late Pleistocene. The inferred ecological niche, patterns of gene flow and genetic differentiation in M. gerpi suggest that the potential for rivers to act as biogeographic barriers depended on both size and elevation of headwaters. Populations on opposite sides of the largest river in the area with headwaters that extend far into the highlands show particularly high genetic differentiation, whereas rivers with lower elevation headwaters have weaker barrier functions, indicated by higher migration rates and admixture. We conclude that M. gerpi likely diversified through repeated cycles of dispersal punctuated by isolation to refugia as a result of paleoclimatic fluctuations during the Pleistocene. We argue that this diversification scenario serves as a model of diversification for other rainforest taxa that are similarly limited by geographic factors. In addition, we highlight conservation implications for this critically endangered species, which faces extreme habitat loss and fragmentation.
Collapse
|
4
|
Assessing the impact of recombination on the estimation of isolation-with-migration models using genomic data: a simulation study. Genomics Inform 2023; 21:e27. [PMID: 37415456 PMCID: PMC10326538 DOI: 10.5808/gi.23016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/20/2023] [Accepted: 05/22/2023] [Indexed: 07/08/2023] Open
Abstract
Recombination events complicate the evolutionary history of populations and species and have a significant impact on the inference of isolation-with-migration (IM) models. However, several existing methods have been developed, assuming no recombination within a locus and free recombination between loci. In this study, we investigated the effect of recombination on the estimation of IM models using genomic data. We conducted a simulation study to evaluate the consistency of the parameter estimators with up to 1,000 loci and analyze true gene trees to examine the sources of errors in estimating the IM model parameters. The results showed that the presence of recombination led to biased estimates of the IM model parameters, with population sizes being more overestimated and migration rates being more underestimated as the number of loci increased. The magnitude of the biases tended to increase with the recombination rates when using 100 or more loci. On the other hand, the estimation of splitting times remained consistent as the number of loci increased. In the absence of recombination, the estimators of the IM model parameters remained consistent.
Collapse
|
5
|
Recombination smooths the time signal disrupted by latency in within-host HIV phylogenies. Virus Evol 2023; 9:vead032. [PMID: 37397911 PMCID: PMC10313349 DOI: 10.1093/ve/vead032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/07/2023] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open
Abstract
Within-host Human immunodeficiency virus (HIV) evolution involves several features that may disrupt standard phylogenetic reconstruction. One important feature is reactivation of latently integrated provirus, which has the potential to disrupt the temporal signal, leading to variation in the branch lengths and apparent evolutionary rates in a tree. Yet, real within-host HIV phylogenies tend to show clear, ladder-like trees structured by the time of sampling. Another important feature is recombination, which violates the fundamental assumption that evolutionary history can be represented by a single bifurcating tree. Thus, recombination complicates the within-host HIV dynamic by mixing genomes and creating evolutionary loop structures that cannot be represented in a bifurcating tree. In this paper, we develop a coalescent-based simulator of within-host HIV evolution that includes latency, recombination, and effective population size dynamics that allows us to study the relationship between the true, complex genealogy of within-host HIV evolution, encoded as an ancestral recombination graph (ARG), and the observed phylogenetic tree. To compare our ARG results to the familiar phylogeny format, we calculate the expected bifurcating tree after decomposing the ARG into all unique site trees, their combined distance matrix, and the overall corresponding bifurcating tree. While latency and recombination separately disrupt the phylogenetic signal, remarkably, we find that recombination recovers the temporal signal of within-host HIV evolution caused by latency by mixing fragments of old, latent genomes into the contemporary population. In effect, recombination averages over extant heterogeneity, whether it stems from mixed time signals or population bottlenecks. Furthermore, we establish that the signals of latency and recombination can be observed in phylogenetic trees despite being an incorrect representation of the true evolutionary history. Using an approximate Bayesian computation method, we develop a set of statistical probes to tune our simulation model to nine longitudinally sampled within-host HIV phylogenies. Because ARGs are exceedingly difficult to infer from real HIV data, our simulation system allows investigating effects of latency, recombination, and population size bottlenecks by matching decomposed ARGs to real data as observed in standard phylogenies.
Collapse
|
6
|
Dating the origin and spread of specialization on human hosts in Aedes aegypti mosquitoes. eLife 2023; 12:83524. [PMID: 36897062 PMCID: PMC10038657 DOI: 10.7554/elife.83524] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/10/2023] [Indexed: 03/11/2023] Open
Abstract
The globally invasive mosquito subspecies Aedes aegypti aegypti is an effective vector of human arboviruses, in part because it specializes in biting humans and breeding in human habitats. Recent work suggests that specialization first arose as an adaptation to long, hot dry seasons in the West African Sahel, where Ae. aegypti relies on human-stored water for breeding. Here, we use whole-genome cross-coalescent analysis to date the emergence of human-specialist populationsand thus further probe the climate hypothesis. Importantly, we take advantage of the known migration of specialists out of Africa during the Atlantic Slave Trade to calibrate the coalescent clock and thus obtain a more precise estimate of the older evolutionary event than would otherwise be possible. We find that human-specialist mosquitoes diverged rapidly from ecological generalists approximately 5000 years ago, at the end of the African Humid Period-a time when the Sahara dried and water stored by humans became a uniquely stable, aquatic niche in the Sahel. We also use population genomic analyses to date a previously observed influx of human-specialist alleles into major West African cities. The characteristic length of tracts of human-specialist ancestry present on a generalist genetic background in Kumasi and Ouagadougou suggests the change in behavior occurred during rapid urbanization over the last 20-40 years. Taken together, we show that the timing and ecological context of two previously observed shifts towards human biting in Ae. aegypti differ; climate was likely the original driver, but urbanization has become increasingly important in recent decades.
Collapse
|
7
|
On the origin and structure of haplotype blocks. Mol Ecol 2023; 32:1441-1457. [PMID: 36433653 PMCID: PMC10946714 DOI: 10.1111/mec.16793] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 11/16/2022] [Accepted: 11/18/2022] [Indexed: 11/27/2022]
Abstract
The term "haplotype block" is commonly used in the developing field of haplotype-based inference methods. We argue that the term should be defined based on the structure of the Ancestral Recombination Graph (ARG), which contains complete information on the ancestry of a sample. We use simulated examples to demonstrate key features of the relationship between haplotype blocks and ancestral structure, emphasizing the stochasticity of the processes that generate them. Even the simplest cases of neutrality or of a "hard" selective sweep produce a rich structure, often missed by commonly used statistics. We highlight a number of novel methods for inferring haplotype structure, based on the full ARG, or on a sequence of trees, and illustrate how they can be used to define haplotype blocks using an empirical data set. While the advent of new, computationally efficient methods makes it possible to apply these concepts broadly, they (and additional new methods) could benefit from adding features to explore haplotype blocks, as we define them. Understanding and applying the concept of the haplotype block will be essential to fully exploit long and linked-read sequencing technologies.
Collapse
|
8
|
Comparative Epidemiology of Rabbit Haemorrhagic Disease Virus Strains from Viral Sequence Data. Viruses 2022; 15:21. [PMID: 36680062 PMCID: PMC9865945 DOI: 10.3390/v15010021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
Since their introduction in 1859, European rabbits (Oryctolagus cuniculus) have had a devastating impact on agricultural production and biodiversity in Australia, with competition and land degradation by rabbits being one of the key threats to agricultural and biodiversity values in Australia. Biocontrol agents, with the most important being the rabbit haemorrhagic disease virus 1 (RHDV1), constitute the most important landscape-scale control strategies for rabbits in Australia. Monitoring field strain dynamics is complex and labour-intensive. Here, using phylodynamic models to analyse the available RHDV molecular data, we aimed to: investigate the epidemiology of various strains, use molecular data to date the emergence of new variants and evaluate whether different strains are outcompeting one another. We determined that the two main pathogenic lagoviruses variants in Australia (RHDV1 and RHDV2) have had similar dynamics since their release, although over different timeframes (substantially shorter for RHDV2). We also found a strong geographic difference in their activities and evidence of overall competition between the two viruses.
Collapse
|
9
|
Genomic insights into recent species divergence in Nicotiana benthamiana and natural variation in Rdr1 gene controlling viral susceptibility. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:7-18. [PMID: 35535507 PMCID: PMC9543217 DOI: 10.1111/tpj.15801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 05/05/2022] [Accepted: 05/07/2022] [Indexed: 05/31/2023]
Abstract
One of the most commonly encountered and frequently cited laboratory organisms worldwide is classified taxonomically as Nicotiana benthamiana (Solanaceae), an accession of which, typically referred to as LAB, is renowned for its unique susceptibility to a wide range of plant viruses and hence capacity to be transformed using a variety of methods. This susceptibility is the result of an insertion and consequent loss of function in the RNA-dependent RNA polymerase 1 (Rdr1) gene. However, the origin and age of LAB and the evolution of N. benthamiana across its wide distribution in Australia remain relatively underexplored. Here, we have used multispecies coalescent methods on genome-wide single nucleotide polymorphisms (SNPs) to assess species limits, phylogenetic relationships and divergence times within N. benthamiana. Our results show that the previous taxonomic concept of this species in fact comprises five geographically, morphologically and genetically distinct species, one of which includes LAB. We provide clear evidence that LAB is closely related to accessions collected further north in the Northern Territory; this species split much earlier, c. 1.1 million years ago, from their common ancestor than the other four in this clade and is morphologically the most distinctive. We also found that the Rdr1 gene insertion is variable among accessions from the northern portions of the Northern Territory. Furthermore, this long-isolated species typically grows in sheltered sites in subtropical/tropical monsoon areas of northern Australia, contradicting the previously advanced hypothesis that this species is an extremophile that has traded viral resistance for precocious development.
Collapse
|
10
|
The Probability of Joint Monophyly of Samples of Gene Lineages for All Species in an Arbitrary Species Tree. J Comput Biol 2022; 29:679-703. [PMID: 35544237 DOI: 10.1089/cmb.2021.0647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Monophyly is a feature of a set of genetic lineages in which every lineage in the set is more closely related to all other members of the set than it is to any lineage outside the set. Multiple sets of lineages that are separately monophyletic are said to be reciprocally monophyletic, or jointly monophyletic. The prevalence of reciprocal monophyly, or joint monophyly (JM), has been used to evaluate phylogenetic and phylogeographic hypotheses, as well as to delimit species. These applications often make use of a probability of JM under models of gene lineage evolution. Studies in coalescent theory have computed this JM probability for small numbers of separate groups in arbitrary species trees and for arbitrary numbers of separate groups in trivial species trees. In this study, generalizing existing results on monophyly probabilities under the multispecies coalescent, we derive the probability of JM for arbitrary numbers of separate groups in arbitrary species trees. We illustrate how our result collapses to previously examined cases. We also study the effect of tree height, sample size, and number of species on the probability of JM. We obtain relatively simple lower and upper bounds on the JM probability. Our results expand the scope of JM calculations beyond small numbers of species, subsuming past formulas that have been used in simpler cases.
Collapse
|
11
|
A dynamic ancestral graph model and GPU-based simulation of a community based on metagenomic sampling. Mol Ecol Resour 2022; 22:2429-2442. [PMID: 35348284 DOI: 10.1111/1755-0998.13613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 03/06/2022] [Accepted: 03/23/2022] [Indexed: 11/29/2022]
Abstract
In this paper we present an ancestral graph model of the evolution of a guild in an ecological community. The model is based on a metagenomic sampling design in that a random sample is taken at the community, as opposed the taxon, level and species are discovered by genetic sequencing. The specific implementation of the model envisions an ecological guild that was founded by colonization at some point in the past that then potentially undergoes diversification by natural selection. Within the graph, species emerge and evolve through the diversification process and their densities in the graph are dynamic and governed by both ecological drift and random genetic drift, as well as differential viability. We employ the 3% sequence divergence rule at a marker locus to identify Operational Taxonomic Units. We then explore approaches to see if there are indirect signals of the diversification process, including population genetic and ecological approaches. In terms of population genetics, we study the joint site frequency spectrum of OTUs, as well its associated statistics. In terms of ecology, we study the species (or OTU) abundance distribution. For both we observe deviations from neutrality, which indicates that there may be signals of diversifying selection in metagenomic studies under certain conditions. The model is available as a GPU-based computer program in C/C++ and using OpenCL, with the long-term goal of adding functionality iterativelyto model large-scale eco-evolutionary processes for metagenomic data.
Collapse
|
12
|
Abstract
Current methods of identifying positively selected regions in the genome are limited in two key ways: the underlying models cannot account for the timing of adaptive events and the comparison between models of selective sweeps and sequence data is generally made via simple summaries of genetic diversity. Here, we develop a tractable method of describing the effect of positive selection on the genealogical histories in the surrounding genome, explicitly modeling both the timing and context of an adaptive event. In addition, our framework allows us to go beyond analyzing polymorphism data via the site frequency spectrum or summaries thereof and instead leverage information contained in patterns of linked variants. Tests on both simulations and a human data example, as well as a comparison to SweepFinder2, show that even with very small sample sizes, our analytic framework has higher power to identify old selective sweeps and to correctly infer both the time and strength of selection. Finally, we derived the marginal distribution of genealogical branch lengths at a locus affected by selection acting at a linked site. This provides a much-needed link between our analytic understanding of the effects of sweeps on sequence variation and recent advances in simulation and heuristic inference procedures that allow researchers to examine the sequence of genealogical histories along the genome.
Collapse
|
13
|
Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci U S A 2021; 118:2026746118. [PMID: 34413189 DOI: 10.1073/pnas.2026746118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Range expansions accelerate evolution through multiple mechanisms, including gene surfing and genetic drift. The inference and control of these evolutionary processes ultimately rely on the information contained in genealogical trees. Currently, there are two opposing views on how range expansions shape genealogies. In invasion biology, expansions are typically approximated by a series of population bottlenecks producing genealogies with only pairwise mergers between lineages-a process known as the Kingman coalescent. Conversely, traveling wave models predict a coalescent with multiple mergers, known as the Bolthausen-Sznitman coalescent. Here, we unify these two approaches and show that expansions can generate an entire spectrum of coalescent topologies. Specifically, we show that tree topology is controlled by growth dynamics at the front and exhibits large differences between pulled and pushed expansions. These differences are explained by the fluctuations in the total number of descendants left by the early founders. High growth cooperativity leads to a narrow distribution of reproductive values and the Kingman coalescent. Conversely, low growth cooperativity results in a broad distribution, whose exponent controls the merger sizes in the genealogies. These broad distribution and non-Kingman tree topologies emerge due to the fluctuations in the front shape and position and do not occur in quasi-deterministic simulations. Overall, our results show that range expansions provide a robust mechanism for generating different types of multiple mergers, which could be similar to those observed in populations with strong selection or high fecundity. Thus, caution should be exercised in making inferences about the origin of non-Kingman genealogies.
Collapse
|
14
|
SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING. Ann Appl Stat 2021; 14:727-751. [PMID: 33995755 DOI: 10.1214/19-aoas1313] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state-space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data, which in this work are completely linked sequences of DNA at a non recombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the β-globin locus.
Collapse
|
15
|
Abstract
Hybridization in plants may result in hybrid speciation or introgression and, thus, is now widely understood to be an important mechanism of species diversity on an evolutionary timescale. Hybridization is particularly common in ferns, as is polyploidy, which often results from hybrid crosses. Nevertheless, hybrid speciation as an evolutionary process in fern lineages remains poorly understood. Here, we employ flow cytometry, phylogeny, genomewide single nucleotide polymorphism data sets, and admixture and coalescent modeling to show that the scaly tree fern, Gymnosphaera metteniana is a naturally occurring allotetraploid species derived from hybridization between the diploids, G. denticulata and G. gigantea. Moreover, we detected ongoing gene flow between the hybrid species and its progenitors, and we found that G. gigantea and G. metteniana inhabit distinct niches, whereas climatic niches of G. denticulata and G. metteniana largely overlap. Taken together, these results suggest that either some degree of intrinsic genetic isolation between the hybrid species and its parental progenitors or ecological isolation over short distances may be playing an important role in the evolution of reproductive barriers. Historical climate change may have facilitated the origin of G. metteniana, with the timing of hybridization coinciding with a period of intensification of the East Asian monsoon during the Pliocene and Pleistocene periods in southern China. Our study of allotetraploid G. metteniana represents the first genomic-level documentation of hybrid speciation in scaly tree ferns and, thus, provides a new perspective on evolution in the lineage.
Collapse
|
16
|
Ultraconserved Elements Improve the Resolution of Difficult Nodes within the Rapid Radiation of Neotropical Sigmodontine Rodents (Cricetidae: Sigmodontinae). Syst Biol 2021; 70:1090-1100. [PMID: 33787920 DOI: 10.1093/sysbio/syab023] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 03/23/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open
Abstract
Sigmodontine rodents (Cricetidae, Sigmodontinae) represent the second largest muroid subfamily and the most species-rich group of New World mammals, encompassing above 410 living species and ca. 87 genera. Even with advances on the clarification of sigmodontine phylogenetic relationships that have been made recently, the phylogenetic relationships among the 12 main group of genera (i.e., tribes) remain poorly resolved, in particular among those forming the large clade Oryzomyalia. This pattern has been interpreted as consequence of a rapid radiation upon the group entrance into South America. Here, we attempted to resolve phylogenetic relationships within Sigmodontinae using target capture and high-throughput sequencing of ultraconserved elements (UCEs). We enriched and sequenced UCEs for 56 individuals and collected data from four already available genomes. Analyses of distinct data sets, based on the capture of 4,634 loci, resulted in a highly resolved phylogeny consistent across different methods. Coalescent species-tree based approaches, concatenated matrices, and Bayesian analyses recovered similar topologies that were congruent at the resolution of difficult nodes. We recovered good support for the intertribal relationships within Oryzomyalia; for instance, the tribe Oryzomyini appears as the sister taxa of the remaining oryzomyalid tribes. The estimates of divergence times agree with results of previous studies. We inferred the crown age of the sigmodontine rodents at the end of Middle Miocene, while the main lineages of Oryzomyalia appear to have radiated in a short interval during the Late Miocene. Thus, the collection of a genomic scale data set with a wide taxonomic sampling, provided resolution for the first time of the relationships among the main lineages of Sigmodontinae. We expect the phylogeny presented here will become the backbone for future systematic and evolutionary studies of the group.
Collapse
|
17
|
Abstract
A labeled gene tree topology that is more probable than the labeled gene tree topology matching a species tree is called "anomalous." Species trees that can generate such anomalous gene trees are said to be in the "anomaly zone." Here, probabilities of "unranked" and "ranked" gene tree topologies under the multispecies coalescent are considered. A ranked tree depicts not only the topological relationship among gene lineages, as an unranked tree does, but also the sequence in which the lineages coalesce. In this article, we study how the parameters of a species tree simulated under a constant-rate birth-death process can affect the probability that the species tree lies in the anomaly zone. We find that with more than five taxa, it is possible for species trees to have both anomalous unranked and ranked gene trees. The probability of being in either type of anomaly zone increases with more taxa. The probability of anomalous gene trees also increases with higher speciation rates. We observe that the probabilities of unranked anomaly zones are higher and grow much faster than those of ranked anomaly zones as the speciation rate increases. Our simulation shows that the most probable ranked gene tree is likely to have the same unranked topology as the species tree. We design the software PRANC, which computes probabilities of ranked gene tree topologies given a species tree under the coalescent model.
Collapse
|
18
|
The Impacts of Low Diversity Sequence Data on Phylodynamic Inference during an Emerging Epidemic. Viruses 2021; 13:v13010079. [PMID: 33430050 PMCID: PMC7826997 DOI: 10.3390/v13010079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/05/2021] [Accepted: 01/05/2021] [Indexed: 01/06/2023] Open
Abstract
Phylodynamic inference is a pivotal tool in understanding transmission dynamics of viral outbreaks. These analyses are strongly guided by the input of an epidemiological model as well as sequence data that must contain sufficient intersequence variability in order to be informative. These criteria, however, may not be met during the early stages of an outbreak. Here we investigate the impact of low diversity sequence data on phylodynamic inference using the birth–death and coalescent exponential models. Through our simulation study, estimating the molecular evolutionary rate required enough sequence diversity and is an essential first step for any phylodynamic inference. Following this, the birth–death model outperforms the coalescent exponential model in estimating epidemiological parameters, when faced with low diversity sequence data due to explicitly exploiting the sampling times. In contrast, the coalescent model requires additional samples and therefore variability in sequence data before accurate estimates can be obtained. These findings were also supported through our empirical data analyses of an Australian and a New Zealand cluster outbreaks of SARS-CoV-2. Overall, the birth–death model is more robust when applied to datasets with low sequence diversity given sampling is specified and this should be considered for future viral outbreak investigations.
Collapse
|
19
|
Abstract
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
Collapse
|
20
|
Abstract
Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright-Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright-Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright-Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright-Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.
Collapse
|
21
|
Evidence of increasing diversification of emerging Severe Acute Respiratory Syndrome Coronavirus 2 strains. J Med Virol 2020; 92:2165-2172. [PMID: 32410229 PMCID: PMC7273070 DOI: 10.1002/jmv.26018] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 12/28/2022]
Abstract
On 30th January 2020, an outbreak of atypical pneumonia caused by a novel betacoronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a public health emergency of international concern by the World Health Organization. For this reason, a detailed evolutionary analysis of SARS-CoV-2 strains currently circulating in different geographic regions of the world was performed. A compositional analysis as well as a Bayesian coalescent analysis of complete genome sequences of SARS-CoV-2 strains recently isolated in Europe, North America, South America, and Asia was performed. The results of these studies revealed a diversification of SARS-CoV-2 strains in three different genetic clades. Co-circulation of different clades in different countries, as well as different genetic lineages within different clades were observed. The time of the most recent common ancestor was established to be around 1st November 2019. A mean rate of evolution of 6.57 × 10-4 substitutions per site per year was found. A significant migration rate per genetic lineage per year from Europe to South America was also observed. The results of these studies revealed an increasing diversification of SARS-CoV-2 strains. High evolutionary rates and fast population growth characterizes the population dynamics of SARS-CoV-2 strains.
Collapse
|
22
|
Abstract
Many evolutionary biologists collect genetic data from natural populations and then need to investigate the relationship among these populations to compare different biogeographic hypotheses. MIGRATE, a useful tool for exploring relationships between populations and comparing hypotheses, has existed since 1998. Throughout the years, it has steadily improved in both the quality of algorithms used and in the efficiency of carrying out those calculations, thus allowing for a larger number of loci to be evaluated. This efficiency has been enhanced, as MIGRATE has been developed to perform many of its calculations concurrently when running on a computer cluster. The program is based on the coalescence theory and uses Bayesian inference to estimate posterior probability densities of all the parameters of a user‐specified population model. Complex models, which include migration and colonization parameters, can be specified. These models can be evaluated using marginal likelihoods, thus allowing a user to compare the merits of different hypotheses. The three presented protocols will help novice users to develop sophisticated analysis techniques useful for their research projects. © 2019 The Authors. Basic Protocol 1: First steps with MIGRATE Basic Protocol 2: Population model specification Basic Protocol 3: Prior distribution specification Basic Protocol 4: Model selection Support Protocol 1: Installing the program MIGRATE Support Protocol 2: Installation of parallel MIGRATE
Collapse
|
23
|
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories. Biometrics 2020; 76:677-690. [PMID: 32277713 DOI: 10.1111/biom.13276] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 04/26/2019] [Accepted: 07/09/2019] [Indexed: 11/26/2022]
Abstract
Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.
Collapse
|
24
|
mstree: A Multispecies Coalescent Approach for Estimating Ancestral Population Size and Divergence Time during Speciation with Gene Flow. Genome Biol Evol 2020; 12:715-719. [PMID: 32365209 PMCID: PMC7259675 DOI: 10.1093/gbe/evaa087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2020] [Indexed: 11/28/2022] Open
Abstract
Gene flow between species may cause variations in branch length and topology of gene tree, which are beyond the expected variations from ancestral processes. These additional variations make it difficult to estimate parameters during speciation with gene flow, as the pattern of these additional variations differs with the relationship between isolation and migration. As far as we know, most methods rely on the assumption about the relationship between isolation and migration by a given model, such as the isolation-with-migration model, when estimating parameters during speciation with gene flow. In this article, we develop a multispecies coalescent approach which does not rely on any assumption about the relationship between isolation and migration when estimating parameters and is called mstree. mstree is available at https://github.com/liujunfengtop/MStree/ and uses some mathematical inequalities among several factors, which include the species divergence time, the ancestral population size, and the number of gene trees, to estimate parameters during speciation with gene flow. Using simulations, we show that the estimated values of ancestral population sizes and species divergence times are close to the true values when analyzing the simulation data sets, which are generated based on the isolation-with-initial-migration model, secondary contact model, and isolation-with-migration model. Therefore, our method is able to estimate ancestral population sizes and speciation times in the presence of different modes of gene flow and may be helpful to test different theories of speciation.
Collapse
|
25
|
Inference of Historical Population-Size Changes with Allele-Frequency Data. G3-GENES GENOMES GENETICS 2020; 10:211-223. [PMID: 31699776 PMCID: PMC6945023 DOI: 10.1534/g3.119.400854] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex.
Collapse
|
26
|
Phylogenomic Relationships of Diploids and the Origins of Allotetraploids in Dactylorhiza (Orchidaceae). Syst Biol 2020; 69:91-109. [PMID: 31127939 PMCID: PMC6902629 DOI: 10.1093/sysbio/syz035] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 05/12/2019] [Accepted: 05/17/2019] [Indexed: 12/04/2022] Open
Abstract
Disentangling phylogenetic relationships proves challenging for groups that have evolved recently, especially if there is ongoing reticulation. Although they are in most cases immediately isolated from diploid relatives, sets of sibling allopolyploids often hybridize with each other, thereby increasing the complexity of an already challenging situation. Dactylorhiza (Orchidaceae: Orchidinae) is a genus much affected by allopolyploid speciation and reticulate phylogenetic relationships. Here, we use genetic variation at tens of thousands of genomic positions to unravel the convoluted evolutionary history of Dactylorhiza. We first investigate circumscription and relationships of diploid species in the genus using coalescent and maximum likelihood methods, and then group 16 allotetraploids by maximum affiliation to their putative parental diploids, implementing a method based on genotype likelihoods. The direction of hybrid crosses is inferred for each allotetraploid using information from maternally inherited plastid RADseq loci. Starting from age estimates of parental taxa, the relative ages of these allotetraploid entities are inferred by quantifying their genetic similarity to the diploids and numbers of private alleles compared with sibling allotetraploids. Whereas northwestern Europe is dominated by young allotetraploids of postglacial origins, comparatively older allotetraploids are distributed further south, where climatic conditions remained relatively stable during the Pleistocene glaciations. Our bioinformatics approach should prove effective for the study of other naturally occurring, nonmodel, polyploid plant complexes.
Collapse
|
27
|
A practical introduction to sequentially Markovian coalescent methods for estimating demographic history from genomic data. Ecol Evol 2020; 10:579-589. [PMID: 31988743 PMCID: PMC6972798 DOI: 10.1002/ece3.5888] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 10/11/2019] [Accepted: 11/12/2019] [Indexed: 12/31/2022] Open
Abstract
A common goal of population genomics and molecular ecology is to reconstruct the demographic history of a species of interest. A pair of powerful tools based on the sequentially Markovian coalescent have been developed to infer past population sizes using genome sequences. These methods are most useful when sequences are available for only a limited number of genomes and when the aim is to study ancient demographic events. The results of these analyses can be difficult to interpret accurately, because doing so requires some understanding of their theoretical basis and of their sensitivity to confounding factors. In this practical review, we explain some of the key concepts underpinning the pairwise and multiple sequentially Markovian coalescent methods (PSMC and MSMC, respectively). We relate these concepts to the use and interpretation of these methods, and we explain how the choice of different parameter values by the user can affect the accuracy and precision of the inferences. Based on our survey of 100 PSMC studies and 30 MSMC studies, we describe how the two methods are used in practice. Readers of this article will become familiar with the principles, practice, and interpretation of the sequentially Markovian coalescent for inferring demographic history.
Collapse
|
28
|
Properties of Samples With Segregating Polymerase Chain Reaction (PCR) Dropout Mutations Within a Species. Evol Bioinform Online 2019; 15:1176934319883612. [PMID: 31723319 PMCID: PMC6831972 DOI: 10.1177/1176934319883612] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 09/25/2019] [Indexed: 11/17/2022] Open
Abstract
In polymerase chain reaction (PCR)-based DNA sequencing studies, there is the
possibility that mutations at the binding sites of primers result in no primer
binding and therefore no amplification. In this article, we call such mutations
PCR dropouts and present a coalescent-based theory of the distribution of
segregating PCR dropout mutations within a species. We show that dropout
mutations typically occur along branch sections that are at or near the base of
a coalescent tree, if at all. Given that a dropout mutation occurs along a
branch section near the base of a tree, there is a good chance that it causes
the alleles of a large fraction of a species to go unamplified, which distorts
the tree shape. Expected coalescence times and distributions of pairwise
sequence differences in the presence of PCR dropout mutations are derived under
the assumptions of both neutrality and background selection. These expectations
differ from when PCR dropout mutations are absent and may form the basis of
inferential approaches to detect the presence of dropout mutations, as well as
the development of unbiased estimators of statistics associated with
population-level genetic variation.
Collapse
|
29
|
Bayesian Estimation of Population Size Changes by Sampling Tajima's Trees. Genetics 2019; 213:967-986. [PMID: 31511299 PMCID: PMC6827370 DOI: 10.1534/genetics.119.302373] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 09/06/2019] [Indexed: 11/30/2022] Open
Abstract
The large state space of gene genealogies is a major hurdle for inference methods based on Kingman's coalescent. Here, we present a new Bayesian approach for inferring past population sizes, which relies on a lower-resolution coalescent process that we refer to as "Tajima's coalescent." Tajima's coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima's Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman's coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.
Collapse
|
30
|
Bayesian Estimation of Past Population Dynamics in BEAST 1.10 Using the Skygrid Coalescent Model. Mol Biol Evol 2019; 36:2620-2628. [PMID: 31364710 PMCID: PMC6805224 DOI: 10.1093/molbev/msz172] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/24/2019] [Accepted: 07/12/2019] [Indexed: 12/24/2022] Open
Abstract
Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a nonparametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls, and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.
Collapse
|
31
|
The effect of undetected recombination on genealogy sampling and inference under an isolation-with-migration model. Mol Ecol Resour 2019; 19:1593-1609. [PMID: 31479562 DOI: 10.1111/1755-0998.13083] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 07/22/2019] [Accepted: 07/24/2019] [Indexed: 11/30/2022]
Abstract
Many methods for fitting demographic models to data sets of aligned sequences rely upon an assumption that the data have a branching coalescent history without recombination within regions or loci. To mitigate the effects of the failure of this assumption, a common approach is to filter data and sample regions that pass the four-gamete criterion for recombination, an approach that allows data to run, but that is expected to detect only a minority of recombination events. A series of empirical tests of this approach were conducted using computer simulations with and without recombination for a variety of isolation-with-migration (IM) model for two and three populations. Only the IMa3 program was used, but the general results should apply to related genealogy-sampling-based methods for IM models or subsets of IM models. It was found that the details of sampling intervals that pass a four-gamete filter have a moderate effect, and that schemes that use the longest intervals, or that use overlapping intervals, gave poorer results. A simple approach of using a random nonoverlapping interval returned the smallest difference between results with and without recombination, with the mean difference between parameter estimates usually less than 20% of the true value (usually much less). However, the posterior probability distributions for migration rates were flatter with recombination, suggesting that filtering based on the four-gamete criterion, while necessary for methods like these, leads to reduced resolution on migration. A distinct, alternative approach, of using a finite sites mutation model and not filtering the data, performed quite poorly.
Collapse
|
32
|
Inference of complex population histories using whole-genome sequences from multiple populations. Proc Natl Acad Sci U S A 2019; 116:17115-17120. [PMID: 31387977 PMCID: PMC6708337 DOI: 10.1073/pnas.1905060116] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can use whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 y ago, and that the separation of East Asians and Europeans started about 50,000 y ago, with pervasive gene flow between all pairs of populations.
Collapse
|
33
|
Abstract
A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The 'skyline plot' and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalized skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Λ-coalescents (specifically, Beta ( 2 - α , α ) -coalescents) to develop the 'multifurcating skyline plot', which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta ( 2 - α , α ) -coalescent process can be estimated accurately. Furthermore, we apply the multifurcating skyline plot to simulated trees generated by tracking transmissions in an individual-based model of epidemic superspreading. We find that high levels of superspreading are consistent with the high-variance assumptions underlying Λ-coalescents and that the estimated parameters of the Λ-coalescent model contain information about the degree of superspreading.
Collapse
|
34
|
Abstract
The likelihood ratio statistic, with its asymptotic χ 2 distribution at regular model points, is often used for hypothesis testing. However, the asymptotic distribution can differ at model singularities and boundaries, suggesting the use of a χ 2 might be problematic nearby. Indeed, its poor behavior for testing near singularities and boundaries is apparent in simulations, and can lead to conservative or anti-conservative tests. Here we develop a new distribution designed for use in hypothesis testing near singularities and boundaries, which asymptotically agrees with that of the likelihood ratio statistic. For two example trinomial models, arising in the context of inference of evolutionary trees, we show the new distributions outperform a χ 2.
Collapse
|
35
|
DNA barcoding of the rodent genus Oligoryzomys (Cricetidae: Sigmodontinae): mitogenomic-anchored database and identification of nuclear mitochondrial translocations (Numts). Mitochondrial DNA A DNA Mapp Seq Anal 2019; 30:702-712. [PMID: 31208245 DOI: 10.1080/24701394.2019.1622692] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
DNA barcoding has become a standard method for species identification in taxonomically complex groups. An important step of the barcoding process is the construction of a library of voucher-based material that was properly identified by independent methods, free of inaccurate identification, and paralogs. We provide here a cytochrome oxidase I (mt-Co1) DNA barcode database for species of the genus Oligoryzomys, based on type material and karyotyped specimens, and anchored on the mitochondrial genome of one species of Oligoryzomys, O. stramineus. To evaluate the taxonomic determination of new COI sequences, we assessed species intra/interspecific genetic distances (barcode gap), performed the General Mixed Yule Coalescent method (GMYC) for lineages' delimitation, and identified diagnostic nucleotides for each species of Oligoryzomys. Phylogenetic analyses of Oligoryzomys were performed on 2 datasets including 14 of the 23 recognized species of this genus: a mt-Co1 only matrix, and a concatenated matrix including mt-Co1, cytochrome b (mt-Cytb), and intron 7 of the nuclear fibrinogen beta chain gene (i7Fgb). We recovered nuclear-mitochondrial translocated (Numts) pseudogenes on our samples and identified several published sequences that are cases of Numts. We analyzed the rate of non-synonymous and synonymous substitution, which were higher in Numts in comparison to mtDNA sequences. GMYC delimitations and DNA barcode gap results highlight the need for further work that integrate molecular, karyotypic, and morphological analyses, as well as additional sampling, to tackle persistent problems in the taxonomy of Oligoryzomys.
Collapse
|
36
|
Recombinant Strains of Human Parechovirus in Rural Areas in the North of Brazil. Viruses 2019; 11:v11060488. [PMID: 31146371 PMCID: PMC6630568 DOI: 10.3390/v11060488] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Revised: 04/30/2019] [Accepted: 05/03/2019] [Indexed: 02/07/2023] Open
Abstract
We characterized the 24 nearly full-length genomes of human parechoviruses (PeV) from children in the north of Brazil. The initial phylogenetic analysis indicated that 17 strains belonged to genotype 1, 5 to genotype 4, and 1 to genotype 17. A more detailed analysis revealed a high frequency of recombinant strains (58%): A total of 14 of our PeV-As were chimeric, with four distinct recombination patterns identified. Five strains were composed of genotypes 1 and 5 (Rec1/5); five strains shared a complex mosaic pattern formed by genotypes 4, 5, and 17 (Rec4/17/5); two strains were composed of genotypes 1 and 17 (Rec1/17); and two strains were composed of genotype 1 and an undetermined strain (Rec1/und). Coalescent analysis based on the Vp1 gene, which is free of recombination, indicated that the recombinant strains most likely arose in this region approximately 30 years ago. They are present in high frequencies and are circulating in different small and isolated cities in the state of Tocantins. Further studies will be needed to establish whether the detected recombinant strains have been replacing parental strains or if they are co-circulating in distinct frequencies in Tocantins.
Collapse
|
37
|
The Effect of Consanguinity on Between-Individual Identity-by-Descent Sharing. Genetics 2019; 212:305-316. [PMID: 30926583 PMCID: PMC6499533 DOI: 10.1534/genetics.119.302136] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/22/2019] [Indexed: 11/18/2022] Open
Abstract
Consanguineous unions increase the rate at which identical genomic segments are paired within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect identity-by-descent (IBD) genomic sharing between rather than within individuals in a population, however, is not immediately evident from within-individual ROH levels. Using the fact that the time to the most recent common ancestor [Formula: see text] for a pair of genomes at a specific locus is inversely related to the extent of IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within the same individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parameter. Considering mating models that include unions between sibs, first cousins, and nth cousins, we determine the effect of the consanguinity rate on the mean [Formula: see text] for pairs of lineages sampled either within the same individual or in different individuals. The results indicate that consanguinity not only increases ROH sharing between the two genomes within an individual, it also increases IBD sharing between individuals in the population, the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering computations of ROH and between-individual IBD in Jewish populations whose consanguinity rates have been estimated from demographic data, we find that, in accord with the theoretical results, increases in consanguinity and ROH levels inflate levels of IBD sharing between individuals in a population. The results contribute more generally to the interpretation of runs of homozygosity, IBD sharing between individuals, and the relationship between ROH and IBD.
Collapse
|
38
|
Abstract
The fractional coalescent is a generalization of Kingman’s n-coalescent. It facilitates the development of the theory of population genetic processes that deviate from Poisson-distributed waiting times. It also marks the use of methods developed in fractional calculus in population genetics. The fractional coalescent is an extension of Canning’s model, where the variance of the number of offspring per parent is a random variable. The distribution of the number of offspring depends on a parameter α, which is a potential measure of the environmental heterogeneity that is commonly ignored in current inferences. An approach to the coalescent, the fractional coalescent (f-coalescent), is introduced. The derivation is based on the discrete-time Cannings population model in which the variance of the number of offspring depends on the parameter α. This additional parameter α affects the variability of the patterns of the waiting times; values of α<1 lead to an increase of short time intervals, but occasionally allow for very long time intervals. When α=1, the f-coalescent and the Kingman’s n-coalescent are equivalent. The distribution of the time to the most recent common ancestor and the probability that n genes descend from m ancestral genes in a time interval of length T for the f-coalescent are derived. The f-coalescent has been implemented in the population genetic model inference software Migrate. Simulation studies suggest that it is possible to accurately estimate α values from data that were generated with known α values and that the f-coalescent can detect potential environmental heterogeneity within a population. Bayes factor comparisons of simulated data with α<1 and real data (H1N1 influenza and malaria parasites) showed an improved model fit of the f-coalescent over the n-coalescent. The development of the f-coalescent and its inclusion into the inference program Migrate facilitates testing for deviations from the n-coalescent.
Collapse
|
39
|
Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons. Mol Biol Evol 2019; 35:159-179. [PMID: 29087487 PMCID: PMC5850733 DOI: 10.1093/molbev/msx277] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The phylogenetic relationships among extant gibbon species remain unresolved despite numerous efforts using morphological, behavorial, and genetic data and the sequencing of whole genomes. A major challenge in reconstructing the gibbon phylogeny is the radiative speciation process, which resulted in extremely short internal branches in the species phylogeny and extensive incomplete lineage sorting with extensive gene-tree heterogeneity across the genome. Here, we analyze two genomic-scale data sets, with ∼10,000 putative noncoding and exonic loci, respectively, to estimate the species tree for the major groups of gibbons. We used the Bayesian full-likelihood method bpp under the multispecies coalescent model, which naturally accommodates incomplete lineage sorting and uncertainties in the gene trees. For comparison, we included three heuristic coalescent-based methods (mp-est, SVDQuartets, and astral) as well as concatenation. From both data sets, we infer the phylogeny for the four extant gibbon genera to be (Hylobates, (Nomascus, (Hoolock, Symphalangus))). We used simulation guided by the real data to evaluate the accuracy of the methods used. Astral, while not as efficient as bpp, performed well in estimation of the species tree even in presence of excessive incomplete lineage sorting. Concatenation, mp-est and SVDQuartets were unreliable when the species tree contains very short internal branches. Likelihood ratio test of gene flow suggests a small amount of migration from Hylobates moloch to H. pileatus, while cross-genera migration is absent or rare. Our results highlight the utility of coalescent-based methods in addressing challenging species tree problems characterized by short internal branches and rampant gene tree-species tree discordance.
Collapse
|
40
|
Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes. Mol Ecol Resour 2019; 19:552-566. [PMID: 30565882 PMCID: PMC6393187 DOI: 10.1111/1755-0998.12968] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 11/06/2018] [Accepted: 11/09/2018] [Indexed: 12/12/2022]
Abstract
There is an increasing demand for evolutionary models to incorporate relatively realistic dynamics, ranging from selection at many genomic sites to complex demography, population structure, and ecological interactions. Such models can generally be implemented as individual-based forward simulations, but the large computational overhead of these models often makes simulation of whole chromosome sequences in large populations infeasible. This situation presents an important obstacle to the field that requires conceptual advances to overcome. The recently developed tree-sequence recording method (Kelleher, Thornton, Ashander, & Ralph, 2018), which stores the genealogical history of all genomes in the simulated population, could provide such an advance. This method has several benefits: (1) it allows neutral mutations to be omitted entirely from forward-time simulations and added later, thereby dramatically improving computational efficiency; (2) it allows neutral burn-in to be constructed extremely efficiently after the fact, using "recapitation"; (3) it allows direct examination and analysis of the genealogical trees along the genome; and (4) it provides a compact representation of a population's genealogy that can be analysed in Python using the msprime package. We have implemented the tree-sequence recording method in SLiM 3 (a free, open-source evolutionary simulation software package) and extended it to allow the recording of non-neutral mutations, greatly broadening the utility of this method. To demonstrate the versatility and performance of this approach, we showcase several practical applications that would have been beyond the reach of previously existing methods, opening up new horizons for the modelling and exploration of evolutionary processes.
Collapse
|
41
|
Phylogeography indicates incomplete genetic divergence among phenotypically differentiated montane forest populations of Atlapetesalbinucha (Aves, Passerellidae). Zookeys 2019:125-148. [PMID: 30598618 PMCID: PMC6306474 DOI: 10.3897/zookeys.809.28743] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 09/19/2018] [Indexed: 11/12/2022] Open
Abstract
The White-naped Brushfinch (Atlapetesalbinucha) comprises up to eight allopatric subspecies mainly identified by the color of the underparts (gray vs. yellow belly). Yellow and gray bellied forms were long considered two different species (A.albinucha and A.gutturalis), but they are presently considered as one polytypic species. Previous studies in the genus Atlapetes have shown that the phylogeny, based on molecular data, is not congruent with characters such as coloration, ecology, or distributional patterns. The phylogeography of A.albinucha was analyzed using two mitochondrial DNA regions from samples including 24 different localities throughout montane areas from eastern Mexico to Colombia. Phylogeographic analyses using Bayesian inference, maximum likelihood and haplotype network revealed incomplete geographic structure. The genetic diversity pattern is congruent with a recent process of expansion, which is also supported by Ecological Niche Models (ENM) constructed for the species and projected into three past scenarios. Overall, the results revealed an incomplete genetic divergence among populations of A.albinucha in spite of the species’ ample range, which contrasts with previous results of phylogeographic patterns in other Neotropical montane forest bird species, suggesting idiosyncratic evolutionary histories for different taxa throughout the region.
Collapse
|
42
|
Reconstructing the History of Polygenic Scores Using Coalescent Trees. Genetics 2019; 211:235-262. [PMID: 30389808 PMCID: PMC6325695 DOI: 10.1534/genetics.118.301687] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 10/23/2018] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have revealed that many traits are highly polygenic, in that their within-population variance is governed, in part, by small-effect variants at many genetic loci. Standard population-genetic methods for inferring evolutionary history are ill-suited for polygenic traits: when there are many variants of small effect, signatures of natural selection are spread across the genome and are subtle at any one locus. In the last several years, various methods have emerged for detecting the action of natural selection on polygenic scores, sums of genotypes weighted by GWAS effect sizes. However, most existing methods do not reveal the timing or strength of selection. Here, we present a set of methods for estimating the historical time course of a population-mean polygenic score using local coalescent trees at GWAS loci. These time courses are estimated by using coalescent theory to relate the branch lengths of trees to allele-frequency change. The resulting time course can be tested for evidence of natural selection. We present theory and simulations supporting our procedures, as well as estimated time courses of polygenic scores for human height. Because of its grounding in coalescent theory, the framework presented here can be extended to a variety of demographic scenarios, and its usefulness will increase as both GWAS and ancestral-recombination-graph inference continue to progress.
Collapse
|
43
|
A Phylogenomic Perspective on Evolution and Discordance in the Alpine-Arctic Plant Clade Micranthes (Saxifragaceae). FRONTIERS IN PLANT SCIENCE 2019; 10:1773. [PMID: 32117341 PMCID: PMC7020907 DOI: 10.3389/fpls.2019.01773] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 12/19/2019] [Indexed: 05/20/2023]
Abstract
The increased availability of large phylogenomic datasets is often accompanied by difficulties in disentangling and harnessing the data. These difficulties may be enhanced for species resulting from reticulate evolution and/or rapid radiations producing large-scale discordance. As a result, there is a need for methods to investigate discordance, and in turn, use this conflict to inform and aid in downstream analyses. Therefore, we drew upon multiple analytical tools to investigate the evolution of Micranthes (Saxifragaceae), a clade of primarily arctic-alpine herbs impacted by reticulate and rapid radiations. To elucidate the evolution of Micranthes we sought near-complete taxon sampling with multiple accessions per species and assembled extensive nuclear (518 putatively single copy loci) and plastid (95 loci) datasets. In addition to a robust phylogeny for Micranthes, this research shows that genetic discordance presents a valuable opportunity to develop hypotheses about its underlying causes, such as hybridization, polyploidization, and range shifts. Specifically, we present a multi-step approach that incorporates multiple checks points for paralogy, including reciprocally blasting targeted genes against transcriptomes, running paralogy checks during the assembly step, and grouping genes into gene families to look for duplications. We demonstrate that a thorough assessment of discordance can be a source of evidence for evolutionary processes that were not adequately captured by a bifurcating tree model, and helped to clarify processes that have structured the evolution of Micranthes.
Collapse
|
44
|
Bridging trees for posterior inference on ancestral recombination graphs. Proc Math Phys Eng Sci 2018; 474:20180568. [PMID: 30602937 PMCID: PMC6304023 DOI: 10.1098/rspa.2018.0568] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/01/2018] [Indexed: 11/08/2023] Open
Abstract
We present a new Markov chain Monte Carlo algorithm, implemented in the software Arbores, for inferring the history of a sample of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA sequence, facilitating large-scale parallelization.
Collapse
|
45
|
Abstract
Phylogeny estimation is difficult for closely related populations and species, especially if they have been exchanging genes. We present a hierarchical Bayesian, Markov-chain Monte Carlo method with a state space that includes all possible phylogenies in a full Isolation-with-Migration model framework. The method is based on a new type of genealogy augmentation called a "hidden genealogy" that enables efficient updating of the phylogeny. This is the first likelihood-based method to fully incorporate directional gene flow and genetic drift for estimation of a species or population phylogeny. Application to human hunter-gatherer populations from Africa revealed a clear phylogenetic history, with strong support for gene exchange with an unsampled ghost population, and relatively ancient divergence between a ghost population and modern human populations, consistent with human/archaic divergence. In contrast, a study of five chimpanzee populations reveals a clear phylogeny with several pairs of populations having exchanged DNA, but does not support a history with an unsampled ghost population.
Collapse
|
46
|
Selection-Like Biases Emerge in Population Models with Recurrent Jackpot Events. Genetics 2018; 210:1053-1073. [PMID: 30171032 PMCID: PMC6218241 DOI: 10.1534/genetics.118.301516] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 08/26/2018] [Indexed: 11/18/2022] Open
Abstract
Evolutionary dynamics driven out of equilibrium by growth, expansion, or adaptation often generate a characteristically skewed distribution of descendant numbers: the earliest, the most advanced, or the fittest ancestors have exceptionally large number of descendants, which Luria and Delbrück called "jackpot" events. Here, I show that recurrent jackpot events generate a deterministic median bias favoring majority alleles, which is akin to positive frequency-dependent selection (proportional to the log ratio of the frequencies of mutant and wild-type alleles). This fictitious selection force results from the fact that majority alleles tend to sample deeper into the tail of the descendant distribution. The flip side of this sampling effect is the rare occurrence of large frequency hikes in favor of minority alleles, which ensures that the allele frequency dynamics remains neutral in expectation, unless genuine selection is present. The resulting picture of a selection-like bias compensated by rare big jumps allows for an intuitive understanding of allele frequency trajectories and enables the exact calculation of transition densities for a range of important scenarios, including population-size variations and different forms of natural selection. As a general signature of evolution by rare events, fictitious selection hampers the establishment of new beneficial mutations, counteracts balancing selection, and confounds methods to infer selection from data over limited timescales.
Collapse
|
47
|
Appropriate Assignment of Fossil Calibration Information Minimizes the Difference between Phylogenetic and Pedigree Mutation Rates in Humans. Life (Basel) 2018; 8:life8040049. [PMID: 30360410 PMCID: PMC6316143 DOI: 10.3390/life8040049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 10/18/2018] [Accepted: 10/18/2018] [Indexed: 12/24/2022] Open
Abstract
Studies that measured mutation rates in human populations using pedigrees have reported values that differ significantly from rates estimated from the phylogenetic comparison of humans and chimpanzees. Consequently, exchanges between mutation rate values across different timescales lead to conflicting divergence time estimates. It has been argued that this variation of mutation rate estimates across hominoid evolution is in part caused by incorrect assignment of calibration information to the mean coalescent time among loci, instead of the true genetic isolation (speciation) time between humans and chimpanzees. In this study, we investigated the feasibility of estimating the human pedigree mutation rate using phylogenetic data from the genomes of great apes. We found that, when calibration information was correctly assigned to the human⁻chimpanzee speciation time (and not to the coalescent time), estimates of phylogenetic mutation rates were statistically equivalent to the estimates previously reported using studies of human pedigrees. We conclude that, within the range of biologically realistic ancestral generation times, part of the difference between whole-genome phylogenetic and pedigree mutation rates is due to inappropriate assignment of fossil calibration information to the mean coalescent time instead of the speciation time. Although our results focus on the human⁻chimpanzee divergence, our findings are general, and relevant to the inference of the timescale of the tree of life.
Collapse
|
48
|
Genomic surveillance of Neisseria gonorrhoeae to investigate the distribution and evolution of antimicrobial-resistance determinants and lineages. Microb Genom 2018; 4:e000205. [PMID: 30063202 PMCID: PMC6159555 DOI: 10.1099/mgen.0.000205] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/09/2018] [Indexed: 12/24/2022] Open
Abstract
The first extensively drug resistant (XDR) Neisseria gonorrhoeae strain with high resistance to the extended-spectrum cephalosporin ceftriaxone was identified in 2009 in Japan, but no other strain with this antimicrobial-resistance profile has been reported since. However, surveillance to date has been based on phenotypic methods and sequence typing, not genome sequencing. Therefore, little is known about the local population structure at the genomic level, and how resistance determinants and lineages are distributed and evolve. We analysed the whole-genome sequence data and the antimicrobial-susceptibility testing results of 204 strains sampled in a region where the first XDR ceftriaxone-resistant N. gonorrhoeae was isolated, complemented with 67 additional genomes from other time frames and locations within Japan. Strains resistant to ceftriaxone were not found, but we discovered a sequence type (ST)7363 sub-lineage susceptible to ceftriaxone and cefixime in which the mosaic penA allele responsible for reduced susceptibility had reverted to a susceptible allele by recombination. Approximately 85 % of isolates showed resistance to fluoroquinolones (ciprofloxacin) explained by linked amino acid substitutions at positions 91 and 95 of GyrA with 99 % sensitivity and 100 % specificity. Approximately 10 % showed resistance to macrolides (azithromycin), for which genetic determinants are less clear. Furthermore, we revealed different evolutionary paths of the two major lineages: single acquisition of penA X in the ST7363-associated lineage, followed by multiple independent acquisitions of the penA X and XXXIV in the ST1901-associated lineage. Our study provides a detailed picture of the distribution of resistance determinants and disentangles the evolution of the two major lineages spreading worldwide.
Collapse
|
49
|
Global demographic history of human populations inferred from whole mitochondrial genomes. ROYAL SOCIETY OPEN SCIENCE 2018; 5:180543. [PMID: 30225046 PMCID: PMC6124094 DOI: 10.1098/rsos.180543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 07/25/2018] [Indexed: 06/08/2023]
Abstract
The Neolithic transition has led to marked increases in census population sizes across the world, as recorded by a rich archaeological record. However, previous attempts to detect such changes using genetic markers, especially mitochondrial DNA (mtDNA), have mostly been unsuccessful. We use complete mtDNA genomes from over 1700 individuals, from the 1000 Genomes Project Phase 3, to explore changes in populations sizes in five populations for each of four major geographical regions, using a sophisticated coalescent-based Bayesian method (extended Bayesian skyline plots) and mutation rates calibrated with ancient DNA. Despite the power and sophistication of our analysis, we fail to find size changes that correspond to the Neolithic transitions of the study populations. However, we do detect a number of size changes, which tend to be replicated in most populations within each region. These changes are mostly much older than the Neolithic transition and could reflect either population expansion or changes in population structure. Given the amount of migration and population mixing that occurred after these ancient signals were generated, we caution that modern populations will often carry ghost signals of demographic events that occurred far away from their current location.
Collapse
|
50
|
From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales. AMERICAN JOURNAL OF BOTANY 2018; 105:446-462. [PMID: 29738076 DOI: 10.1002/ajb2.1069] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 01/04/2018] [Indexed: 05/27/2023]
Abstract
PREMISE OF THE STUDY The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. METHODS We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. KEY RESULTS Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. CONCLUSIONS Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets.
Collapse
|