1
|
Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022; 220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open
Abstract
Abstract
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some non-equilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, 76230, México
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, 60637, United States of America
| |
Collapse
|
2
|
Lyu W, Dai X, Beaumont M, Yu F, He Z. Inferring the timing and strength of natural selection and gene migration in the evolution of chicken from ancient DNA data. Mol Ecol Resour 2021; 22:1362-1379. [PMID: 34783162 DOI: 10.1111/1755-0998.13553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/10/2021] [Accepted: 09/28/2021] [Indexed: 11/29/2022]
Abstract
With the rapid growth of the number of sequenced ancient genomes, there has been increasing interest in using this new information to study past and present adaptation. Such an additional temporal component has the promise of providing improved power for the estimation of natural selection. Over the last decade, statistical approaches for detection and quantification of natural selection from ancient DNA (aDNA) data have been developed. However, most of the existing methods do not allow us to estimate the timing of natural selection along with its strength, which is key to understanding the evolution and persistence of organismal diversity. Additionally, most methods ignore the fact that natural populations are almost always structured, which can result in overestimation of the effect of natural selection. To address these issues, we introduce a novel Bayesian framework for the inference of natural selection and gene migration from aDNA data with Markov chain Monte Carlo techniques, co-estimating both timing and strength of natural selection and gene migration. Such an advance enables us to infer drivers of natural selection and gene migration by correlating genetic evolution with potential causes such as the changes in the ecological context in which an organism has evolved. The performance of our procedure is evaluated through extensive simulations, with its utility shown with an application to ancient chicken samples.
Collapse
Affiliation(s)
- Wenyang Lyu
- School of Mathematics, University of Bristol, Bristol, BS8 1UG, United Kingdom
| | - Xiaoyang Dai
- School of Biological Sciences, University of Bristol, Bristol, BS8 1TQ, United Kingdom.,The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, United Kingdom
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, BS8 1TQ, United Kingdom
| | - Feng Yu
- School of Mathematics, University of Bristol, Bristol, BS8 1UG, United Kingdom
| | - Zhangyi He
- MRC Toxicology Unit, University of Cambridge, Cambridge, CB2 1QR, United Kingdom.,Cancer Research UK Beatson Institute, Glasgow, G61 1BD, United Kingdom
| |
Collapse
|
3
|
Gonçalves FB, Łatuszyński K, Roberts GO. Barker’s algorithm for Bayesian inference with intractable likelihoods. BRAZ J PROBAB STAT 2017. [DOI: 10.1214/17-bjps374] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
Griffiths RC, Jenkins PA, Spanò D. Wright-Fisher diffusion bridges. Theor Popul Biol 2017; 122:67-77. [PMID: 28993198 DOI: 10.1016/j.tpb.2017.09.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 09/26/2017] [Indexed: 10/18/2022]
Abstract
The trajectory of the frequency of an allele which begins at x at time 0 and is known to have frequency z at time T can be modelled by the bridge process of the Wright-Fisher diffusion. Bridges when x=z=0 are particularly interesting because they model the trajectory of the frequency of an allele which appears at a time, then is lost by random drift or mutation after a time T. The coalescent genealogy back in time of a population in a neutral Wright-Fisher diffusion process is well understood. In this paper we obtain a new interpretation of the coalescent genealogy of the population in a bridge from a time t∈(0,T). In a bridge with allele frequencies of 0 at times 0 and T the coalescence structure is that the population coalesces in two directions from t to 0 and t to T such that there is just one lineage of the allele under consideration at times 0 and T. The genealogy in Wright-Fisher diffusion bridges with selection is more complex than in the neutral model, but still with the property of the population branching and coalescing in two directions from time t∈(0,T). The density of the frequency of an allele at time t is expressed in a way that shows coalescence in the two directions. A new algorithm for exact simulation of a neutral Wright-Fisher bridge is derived. This follows from knowing the density of the frequency in a bridge and exact simulation from the Wright-Fisher diffusion. The genealogy of the neutral Wright-Fisher bridge is also modelled by branching Pólya urns, extending a representation in a Wright-Fisher diffusion. This is a new very interesting representation that relates Wright-Fisher bridges to classical urn models in a Bayesian setting.
Collapse
Affiliation(s)
| | - Paul A Jenkins
- Department of Statistics, University of Warwick, United Kingdom; Department of Computer Science, University of Warwick, United Kingdom.
| | - Dario Spanò
- Department of Statistics, University of Warwick, United Kingdom.
| |
Collapse
|
5
|
|
6
|
Zhao L, Lascoux M, Waxman D. An informational transition in conditioned Markov chains: Applied to genetics and evolution. J Theor Biol 2016; 402:158-70. [PMID: 27105672 DOI: 10.1016/j.jtbi.2016.04.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Revised: 02/19/2016] [Accepted: 04/17/2016] [Indexed: 11/18/2022]
Abstract
In this work we assume that we have some knowledge about the state of a population at two known times, when the dynamics is governed by a Markov chain such as a Wright-Fisher model. Such knowledge could be obtained, for example, from observations made on ancient and contemporary DNA, or during laboratory experiments involving long term evolution. A natural assumption is that the behaviour of the population, between observations, is related to (or constrained by) what was actually observed. The present work shows that this assumption has limited validity. When the time interval between observations is larger than a characteristic value, which is a property of the population under consideration, there is a range of intermediate times where the behaviour of the population has reduced or no dependence on what was observed and an equilibrium-like distribution applies. Thus, for example, if the frequency of an allele is observed at two different times, then for a large enough time interval between observations, the population has reduced or no dependence on the two observed frequencies for a range of intermediate times. Given observations of a population at two times, we provide a general theoretical analysis of the behaviour of the population at all intermediate times, and determine an expression for the characteristic time interval, beyond which the observations do not constrain the population's behaviour over a range of intermediate times. The findings of this work relate to what can be meaningfully inferred about a population at intermediate times, given knowledge of terminal states.
Collapse
Affiliation(s)
- Lei Zhao
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, PR China
| | - Martin Lascoux
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, PR China; Evolutionary Biology Center, Department of Ecology and Genetics, Uppsala University, Uppsala 75236, Sweden
| | - David Waxman
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, PR China.
| |
Collapse
|
7
|
Bayesian Inference of Natural Selection from Allele Frequency Time Series. Genetics 2016; 203:493-511. [PMID: 27010022 DOI: 10.1534/genetics.116.187278] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 03/11/2016] [Indexed: 12/21/2022] Open
Abstract
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.
Collapse
|
8
|
Zhao L, Lascoux M, Waxman D. Exact simulation of conditioned Wright-Fisher models. J Theor Biol 2014; 363:419-26. [PMID: 25173081 DOI: 10.1016/j.jtbi.2014.08.027] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 08/17/2014] [Indexed: 11/19/2022]
Abstract
Forward and backward simulations play an increasing role in population genetics, in particular when inferring the relative importance of evolutionary forces. It is therefore important to develop fast and accurate simulation methods for general population genetics models. Here we present an exact simulation method that generates trajectories of an allele׳s frequency in a finite population, as described by a general Wright-Fisher model. The method generates conditioned trajectories that start from a known frequency at a known time, and which achieve a specific final frequency at a known final time. The simulation method applies irrespective of the smallness of the probability of the transition between the initial and final states, because it is not based on rejection of trajectories. We illustrate the method on several different populations where a Wright-Fisher model (or related) applies, namely (i) a locus with 2 alleles, that is subject to selection and mutation; (ii) a locus with 3 alleles, that is subject to selection; (iii) a locus in a metapopulation consisting of two subpopulations of finite size, that are subject to selection and migration. The simulation method allows the generation of conditioned trajectories that can be used for the purposes of visualisation, the estimation of summary statistics, and the development/testing of new inferential methods. The simulated trajectories provide a very simple approach to estimating quantities that cannot easily be expressed in terms of the transition matrix, and can be applied to finite Markov chains other than the Wright-Fisher model.
Collapse
Affiliation(s)
- Lei Zhao
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, People׳s Republic of China
| | - Martin Lascoux
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, People׳s Republic of China; Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, Uppsala 75236, Sweden
| | - David Waxman
- Centre for Computational Systems Biology, Fudan University, 220 Handan Road, Shanghai 200433, People׳s Republic of China.
| |
Collapse
|
9
|
Schraiber JG. A path integral formulation of the Wright-Fisher process with genic selection. Theor Popul Biol 2014; 92:30-5. [PMID: 24269333 PMCID: PMC3932315 DOI: 10.1016/j.tpb.2013.11.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 11/05/2013] [Indexed: 10/26/2022]
Abstract
The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.
Collapse
|
10
|
|