1
|
Park K, Bae Y. Operator model for evolutionary dynamics. Biosystems 2024; 237:105130. [PMID: 38309419 DOI: 10.1016/j.biosystems.2024.105130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 02/05/2024]
Abstract
Drift, selection, and mutation are integral evolutionary factors. In this article, operator model is newly suggested to intuitively represent those evolutionary factors into mathematical operators, and to ultimately offer unconventional methodology for understanding evolutionary dynamics. To be specific, each of the drift, selection, and mutation was respectively interpreted as operator which in essence is a random matrix that acts upon the vector which contains population distribution information. The simulation results from the operator model coincided with the previous theoretical results for beneficial mutation accumulation rate in concurrent and successional regimes for asexually reproducing case. Furthermore, beneficial mutation accumulation in strong drift regime for asexually reproducing case was observed from the simulation while allowing the interactions of mutations with diverse selection coefficients. Lastly, methods to justify, reinforce, apply, and expand the operator model were discussed to scrutinize the implications of the model. With the operator model's unique characteristics, the model is expected to broaden perspective and to offer effective methodology for understanding the evolutionary process.
Collapse
Affiliation(s)
- Kangbien Park
- Department of Physics, College of Natural Science, Yonsei University, Seoul, 03722, Republic of Korea.
| | - Yonghee Bae
- Department of Physics, College of Natural Science, Yonsei University, Seoul, 03722, Republic of Korea
| |
Collapse
|
2
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
3
|
Stoeckel S, Becheler R, Bocharova E, Barloy D. GenAPoPop 1.0: A user-friendly software to analyse genetic diversity and structure from partially clonal and selfed autopolyploid organisms. Mol Ecol Resour 2024; 24:e13886. [PMID: 37902131 DOI: 10.1111/1755-0998.13886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 10/05/2023] [Accepted: 10/16/2023] [Indexed: 10/31/2023]
Abstract
Autopolyploidy is quite common in most clades of eukaryotes. The emergence of sequence-based genotyping methods with individual and marker tags now enables confident allele dosage, overcoming the main obstacle to the democratization of the population genetic approaches when studying ecology and evolution of autopolyploid populations and species. Reproductive modes, including clonality, selfing and allogamy, have deep consequences on the ecology and evolution of population and species. Analysing genetic diversity and its dynamics over generations is one efficient way to infer the relative importance of clonality, selfing and allogamy in populations. GenAPoPop is a user-friendly solution to compute the specific corpus of population genetic indices, including indices about genotypic diversity, needed to analyse partially clonal, selfed and allogamous polysomic populations genotyped with confident allele dosage. It also easily provides the posterior probabilities of quantitative reproductive modes in autopolyploid populations genotyped at two-time steps and a graphical representation of the minimum spanning trees of the genetic distances between polyploid individuals, facilitating the interpretation of the genetic coancestry between individuals in hierarchically structured populations. GenAPoPop complements the previously existing solutions, including SPAGEDI and POLYGENE, to use genotypings to study the ecology and evolution of autopolyploid populations. It was specially developed with a simple graphical interface and workflow, and comes with a simulator to facilitate practical courses and teaching of population genetics for autopolyploid populations.
Collapse
Affiliation(s)
- Solenn Stoeckel
- IGEPP, INRAE, Institut Agro, Université de Rennes, Le Rheu, France
- DECOD (Ecosystem Dynamics and Sustainability), Institut Agro, IFREMER, INRAE, Rennes, France
| | - Ronan Becheler
- IGEPP, INRAE, Institut Agro, Université de Rennes, Le Rheu, France
- DECOD (Ecosystem Dynamics and Sustainability), Institut Agro, IFREMER, INRAE, Rennes, France
| | - Ekaterina Bocharova
- Evolutionary Developmental Biology laboratory, Koltzov Institute of Developmental Biology of Russian Academy of Sciences (IDB RAS), Moscow, Russia
| | - Dominique Barloy
- DECOD (Ecosystem Dynamics and Sustainability), Institut Agro, IFREMER, INRAE, Rennes, France
| |
Collapse
|
4
|
Savageau MA. Phenotype Design Space Provides a Mechanistic Framework Relating Molecular Parameters to Phenotype Diversity Available for Selection. J Mol Evol 2023; 91:687-710. [PMID: 37620617 PMCID: PMC10598110 DOI: 10.1007/s00239-023-10127-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 07/27/2023] [Indexed: 08/26/2023]
Abstract
Two long-standing challenges in theoretical population genetics and evolution are predicting the distribution of phenotype diversity generated by mutation and available for selection, and determining the interaction of mutation, selection and drift to characterize evolutionary equilibria and dynamics. More fundamental for enabling such predictions is the current inability to causally link genotype to phenotype. There are three major mechanistic mappings required for such a linking - genetic sequence to kinetic parameters of the molecular processes, kinetic parameters to biochemical system phenotypes, and biochemical phenotypes to organismal phenotypes. This article introduces a theoretical framework, the Phenotype Design Space (PDS) framework, for addressing these challenges by focusing on the mapping of kinetic parameters to biochemical system phenotypes. It provides a quantitative theory whose key features include (1) a mathematically rigorous definition of phenotype based on biochemical kinetics, (2) enumeration of the full phenotypic repertoire, and (3) functional characterization of each phenotype independent of its context-dependent selection or fitness contributions. This framework is built on Design Space methods that relate system phenotypes to genetically determined parameters and environmentally determined variables. It also has the potential to automate prediction of phenotype-specific mutation rate constants and equilibrium distributions of phenotype diversity in microbial populations undergoing steady-state exponential growth, which provides an ideal reference to which more realistic cases can be compared. Although the framework is quite general and flexible, the details will undoubtedly differ for different functions, organisms and contexts. Here a hypothetical case study involving a small molecular system, a primordial circadian clock, is used to introduce this framework and to illustrate its use in a particular case. The framework is built on fundamental biochemical kinetics. Thus, the foundation is based on linear algebra and reasonable physical assumptions, which provide numerous opportunities for experimental testing and further elaboration to deal with complex multicellular organisms that are currently beyond its scope. The discussion provides a comparison of results from the PDS framework with those from other approaches in theoretical population genetics.
Collapse
Affiliation(s)
- Michael A Savageau
- Department of Microbiology & Molecular Genetics, University of California, 228 Briggs, Davis, CA, 95616, USA.
- Department of Biomedical Engineering, University of California, One Shields Avenue, Davis, CA, 95616, USA.
| |
Collapse
|
5
|
Lappo E, Denton KK, Feldman MW. Conformity and anti-conformity in a finite population. J Theor Biol 2023; 562:111429. [PMID: 36746297 DOI: 10.1016/j.jtbi.2023.111429] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/07/2023]
Abstract
Conformist and anti-conformist cultural transmission have been studied both empirically, in several species, and theoretically, with population genetic models. Building upon standard, infinite-population models (IPMs) of conformity, we introduce finite-population models (FPMs) and study them via simulation and a diffusion approximation. In previous IPMs of conformity, offspring observe the variants of n adult role models, where n is often three. Numerical simulations show that while the short-term behavior of the FPM with n=3 role models is well approximated by the IPM, stable polymorphic equilibria of the IPM become effective equilibria of the FPM at which the variation persists prior to fixation or loss, and which produce plateaus in curves for fixation probabilities and expected times to absorption. In the FPM with n=5 role models, the population may switch between two effective equilibria, which is not possible in the IPM, or may cycle between frequencies that are not effective equilibria, which is possible in the IPM. In all observed cases of 'equilibrium switching' and 'cycling' in the FPM, model parameters exceed O(1/N), required for the diffusion approximation, resulting in an over-estimation of the actual times to absorption. However, in those cases with n=5 role models that have one effective equilibrium and stable fixation states, even if conformity coefficients exceed O(1/N), the diffusion approximation matches closely the numerical simulations of the FPM. This suggests that the robustness of the diffusion approximation depends not only on the magnitudes of coefficients, but also on the qualitative behavior of the conformity model.
Collapse
Affiliation(s)
- Egor Lappo
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Kaleda K Denton
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Marcus W Feldman
- Department of Biology, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
6
|
Steinmetz B, Meyer I, Shnerb NM. Evolution in fluctuating environments: A generic modular approach. Evolution 2022; 76:2739-2757. [PMID: 36097355 PMCID: PMC9828023 DOI: 10.1111/evo.14616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 07/23/2022] [Indexed: 01/22/2023]
Abstract
Evolutionary processes take place in fluctuating environments, where carrying capacities and selective forces vary over time. The fate of a mutant type and the persistence time of polymorphic states were studied in some specific cases of varying environments, but a generic methodology is still lacking. Here, we present such a general analytic framework. We first identify a set of elementary building blocks, a few basic demographic processes like logistic or exponential growth, competition at equilibrium, sudden decline, and so on. For each of these elementary blocks, we evaluate the mean and the variance of the changes in the frequency of the mutant population. Finally, we show how to find the relevant terms of the diffusion equation for each arbitrary combination of these blocks. Armed with this technique one may calculate easily the quantities that govern the evolutionary dynamics, like the chance of ultimate fixation, the time to absorption, and the time to fixation.
Collapse
Affiliation(s)
- Bnaya Steinmetz
- Department of PhysicsBar‐Ilan UniversityRamat‐GanIL52900Israel
| | - Immanuel Meyer
- Department of PhysicsBar‐Ilan UniversityRamat‐GanIL52900Israel
| | - Nadav M. Shnerb
- Department of PhysicsBar‐Ilan UniversityRamat‐GanIL52900Israel
| |
Collapse
|
7
|
Bräutigam C, Smerlak M. Diffusion approximations in population genetics and the rate of Muller's ratchet. J Theor Biol 2022; 550:111236. [PMID: 35926567 DOI: 10.1016/j.jtbi.2022.111236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 07/13/2022] [Accepted: 07/25/2022] [Indexed: 10/16/2022]
Abstract
The Wright-Fisher binomial model of allele frequency change is often approximated by a scaling limit in which selection, mutation and drift all decrease at the same 1/N rate. This construction restricts the applicability of the resulting 'Wright-Fisher diffusion equation' to the weak selection, weak mutation regime of evolution. We argue that diffusion approximations of the Wright-Fisher model can be used more generally, for instance in cases where genetic drift is much weaker than selection. One important example of this regime is Muller's ratchet phenomenon, whereby deleterious mutations slowly but irreversibly accumulate through rare stochastic fluctuations. Using a modified diffusion equation we derive improved analytical estimates for the mean click time of the ratchet.
Collapse
Affiliation(s)
- Camila Bräutigam
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
| | - Matteo Smerlak
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
| |
Collapse
|
8
|
Saubin M, Louet C, Bousset L, Fabre F, Frey P, Fudal I, Grognard F, Hamelin F, Mailleret L, Stoeckel S, Touzeau S, Petre B, Halkett F. Improving sustainable crop protection using population genetics concepts. Mol Ecol 2022; 32:2461-2471. [PMID: 35906846 DOI: 10.1111/mec.16634] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 06/23/2022] [Accepted: 06/24/2022] [Indexed: 10/16/2022]
Abstract
Growing genetically resistant plants allows pathogen populations to be controlled and reduces the use of pesticides. However, pathogens can quickly overcome such resistance. In this context, how can we achieve sustainable crop protection? This crucial question has remained largely unanswered despite decades of intense debate and research effort. In this study, we used a bibliographic analysis to show that the research field of resistance durability has evolved into three subfields: (i) 'plant breeding' (generating new genetic material), (ii) 'molecular interactions' (exploring the molecular dialogue governing plant-pathogen interactions) and (iii) 'epidemiology and evolution' (explaining and forecasting of pathogen population dynamics resulting from selection pressure(s) exerted by resistant plants). We argue that this triple split of the field impedes integrated research progress and ultimately compromises the sustainable management of genetic resistance. After identifying a gap among the three subfields, we argue that the theoretical framework of population genetics could bridge this gap. Indeed, population genetics formally explains the evolution of all heritable traits, and allows genetic changes to be tracked along with variation in population dynamics. This provides an integrated view of pathogen adaptation, in particular via evolutionary-epidemiological feedbacks. In this Opinion Note, we detail examples illustrating how such a framework can better inform best practices for developing and managing genetically resistant cultivars.
Collapse
Affiliation(s)
| | - Clémentine Louet
- Université de Lorraine, INRAE, IAM, Nancy, France.,Université Paris Saclay, INRAE, BIOGER, Thiverval-Grignon, France
| | - Lydia Bousset
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | - Frédéric Fabre
- INRAE, Bordeaux Sciences Agro, SAVE, F-33882 Villenave d'Ornon, France
| | - Pascal Frey
- Université de Lorraine, INRAE, IAM, Nancy, France
| | - Isabelle Fudal
- Université Paris Saclay, INRAE, BIOGER, Thiverval-Grignon, France
| | - Frédéric Grognard
- Université Côte d'Azur, Inria, INRAE, CNRS, Sorbonne Université, Biocore team, Sophia Antipolis, France
| | - Frédéric Hamelin
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | - Ludovic Mailleret
- Université Côte d'Azur, Inria, INRAE, CNRS, Sorbonne Université, Biocore team, Sophia Antipolis, France.,Université Côte d'Azur, INRAE, CNRS, ISA, Sophia Antipolis, France
| | - Solenn Stoeckel
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | - Suzanne Touzeau
- Université Côte d'Azur, Inria, INRAE, CNRS, Sorbonne Université, Biocore team, Sophia Antipolis, France
| | | | | |
Collapse
|
9
|
Pedigree in the biparental Moran model. J Math Biol 2022; 84:51. [DOI: 10.1007/s00285-022-01752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 12/07/2021] [Accepted: 04/05/2022] [Indexed: 11/25/2022]
|
10
|
Roffé AJ. Drift as constitutive: conclusions from a formal reconstruction of population genetics. HISTORY AND PHILOSOPHY OF THE LIFE SCIENCES 2019; 41:55. [PMID: 31749015 DOI: 10.1007/s40656-019-0294-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 11/12/2019] [Indexed: 06/10/2023]
Abstract
This article elaborates on McShea and Brandon's idea that drift is unlike the rest of the evolutionary factors because it is constitutive rather than imposed on the evolutionary process. I show that the way they spelled out this idea renders it inadequate and is the reason why it received some (good) objections. I propose a different way in which their point could be understood, that rests on two general distinctions. The first is a distinction between the underlying mathematical apparatus used to formulate a theory and a concept proposed by that theory. With the aid of a formal reconstruction of a population genetic model, I show that drift belongs to the first category. That is, that drift is constitutive of population genetics in the same sense that multiplication is constitutive in classical mechanics, or that circle is constitutive in Ptolemaic astronomy. The second distinction is between eliminating a concept from a theory and setting its value to zero. I will show that even though drift can be set to zero just like the rest of the evolutionary factors (as others have noted in their criticism of McShea and Brandon), eliminating drift is much harder than eliminating those other factors, since it would require changing the entire mathematical apparatus of standard population genetic theory. I conclude by drawing some other implications from the proposed formal reconstruction.
Collapse
Affiliation(s)
- Ariel Jonathan Roffé
- Centro de Estudios de Filosofía e Historia de la Ciencia (CEFHIC-UNQ-CONICET), Universidad de Buenos Aires (UBA), Universidad Tres de Febrero (UNTREF), Roque Sáenz Peña 352, B1876BXD, Bernal, Buenos Aires, Argentina.
| |
Collapse
|
11
|
Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol 2018; 66:e30-e46. [PMID: 28173553 PMCID: PMC5837693 DOI: 10.1093/sysbio/syw056] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 05/31/2016] [Accepted: 06/06/2016] [Indexed: 11/14/2022] Open
Abstract
The Wright–Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright–Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright–Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright–Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Maria Simonsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
12
|
Jia C, Zhang MQ, Qian H. Emergent Lévy behavior in single-cell stochastic gene expression. Phys Rev E 2018; 96:040402. [PMID: 29347590 DOI: 10.1103/physreve.96.040402] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Indexed: 11/07/2022]
Abstract
Single-cell gene expression is inherently stochastic; its emergent behavior can be defined in terms of the chemical master equation describing the evolution of the mRNA and protein copy numbers as the latter tends to infinity. We establish two types of "macroscopic limits": the Kurtz limit is consistent with the classical chemical kinetics, while the Lévy limit provides a theoretical foundation for an empirical equation proposed in N. Friedman et al., Phys. Rev. Lett. 97, 168302 (2006)PRLTAO0031-900710.1103/PhysRevLett.97.168302. Furthermore, we clarify the biochemical implications and ranges of applicability for various macroscopic limits and calculate a comprehensive analytic expression for the protein concentration distribution in autoregulatory gene networks. The relationship between our work and modern population genetics is discussed.
Collapse
Affiliation(s)
- Chen Jia
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Michael Q Zhang
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, USA.,MOE Key Lab and Division of Bioinformatics, CSSB, TNLIST, Tsinghua University, Beijing 100084, China
| | - Hong Qian
- Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Jenkins PA, Fearnhead P, Song YS. TRACTABLE DIFFUSION AND COALESCENT PROCESSES FOR WEAKLY CORRELATED LOCI. ELECTRON J PROBAB 2016; 20. [PMID: 27375350 DOI: 10.1214/ejp.v20-3564] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Widely used models in genetics include the Wright-Fisher diffusion and its moment dual, Kingman's coalescent. Each has a multilocus extension but under neither extension is the sampling distribution available in closed-form, and their computation is extremely difficult. In this paper we derive two new multilocus population genetic models, one a diffusion and the other a coalescent process, which are much simpler than the standard models, but which capture their key properties for large recombination rates. The diffusion model is based on a central limit theorem for density dependent population processes, and we show that the sampling distribution is a linear combination of moments of Gaussian distributions and hence available in closed-form. The coalescent process is based on a probabilistic coupling of the ancestral recombination graph to a simpler genealogical process which exposes the leading dynamics of the former. We further demonstrate that when we consider the sampling distribution as an asymptotic expansion in inverse powers of the recombination parameter, the sampling distributions of the new models agree with the standard ones up to the first two orders.
Collapse
Affiliation(s)
- Paul A Jenkins
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK,
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK,
| | - Yun S Song
- Department of Statistics and Computer Science Division, University of California, Berkeley, Berkeley, CA 94720, USA,
| |
Collapse
|
14
|
Effect of spatial constraints on Hardy-Weinberg equilibrium. Sci Rep 2016; 6:19297. [PMID: 26771073 PMCID: PMC4725899 DOI: 10.1038/srep19297] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 12/07/2015] [Indexed: 11/18/2022] Open
Abstract
Panmixia is a key issue in maintaining genetic diversity, which facilitates evolutionary potential during environmental changes. Additionally, conservation biologists suggest the importance of avoiding small or subdivided populations, which are prone to losing genetic diversity. In this paper, computer simulations were performed to the genetic drift of neutral alleles in random mating populations with or without spatial constraints by randomly choosing a mate among the closest neighbours. The results demonstrated that the number of generations required for the neutral allele to become homozygous (Th) varied proportionally to the population size and also strongly correlated with spatial constraints. The average Th for populations of the same size with spatial constraints was approximately one-and-a-half times longer than without constraints. With spatial constraints, homozygous population clusters formed, which reduced local diversity but preserved global diversity. Therefore, panmixia might be harmful in preserving the genetic diversity of an entire population. The results also suggested that the gene flow or gene exchange among the subdivided populations must be carefully processed to restrict diseases transmission or death during transportation and to monitor the genetic diversity. The application of this concept to similar systems, such as information transfer among peers, is also discussed.
Collapse
|
15
|
Haldane A, Manhart M, Morozov AV. Biophysical fitness landscapes for transcription factor binding sites. PLoS Comput Biol 2014; 10:e1003683. [PMID: 25010228 PMCID: PMC4091707 DOI: 10.1371/journal.pcbi.1003683] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 05/11/2014] [Indexed: 11/18/2022] Open
Abstract
Phenotypic states and evolutionary trajectories available to cell populations are ultimately dictated by complex interactions among DNA, RNA, proteins, and other molecular species. Here we study how evolution of gene regulation in a single-cell eukaryote S. cerevisiae is affected by interactions between transcription factors (TFs) and their cognate DNA sites. Our study is informed by a comprehensive collection of genomic binding sites and high-throughput in vitro measurements of TF-DNA binding interactions. Using an evolutionary model for monomorphic populations evolving on a fitness landscape, we infer fitness as a function of TF-DNA binding to show that the shape of the inferred fitness functions is in broad agreement with a simple functional form inspired by a thermodynamic model of two-state TF-DNA binding. However, the effective parameters of the model are not always consistent with physical values, indicating selection pressures beyond the biophysical constraints imposed by TF-DNA interactions. We find little statistical support for the fitness landscape in which each position in the binding site evolves independently, indicating that epistasis is common in the evolution of gene regulation. Finally, by correlating TF-DNA binding energies with biological properties of the sites or the genes they regulate, we are able to rule out several scenarios of site-specific selection, under which binding sites of the same TF would experience different selection pressures depending on their position in the genome. These findings support the existence of universal fitness landscapes which shape evolution of all sites for a given TF, and whose properties are determined in part by the physics of protein-DNA interactions. Specialized proteins called transcription factors turn genes on and off by binding to short stretches of DNA in their regulatory regions. Precise gene regulation is essential for cellular survival and proliferation, and its evolution and maintenance under mutational pressure are central issues in biology. Here we discuss how evolution of gene regulation is shaped by the need to maintain favorable binding energies between transcription factors and their genomic binding sites. We show that, surprisingly, transcription factor binding is not affected by many biological properties, such as the essentiality of the gene it regulates. Rather, all sites for a given factor appear to evolve under a universal set of constraints, which can be rationalized in terms of a simple model inspired by transcription factor – DNA binding thermodynamics.
Collapse
Affiliation(s)
- Allan Haldane
- Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, United States of America
| | - Michael Manhart
- Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, United States of America
| | - Alexandre V. Morozov
- Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, United States of America
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
16
|
Monk T, Green P, Paulin M. Martingales and fixation probabilities of evolutionary graphs. Proc Math Phys Eng Sci 2014. [DOI: 10.1098/rspa.2013.0730] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Evolutionary graph theory is the study of birth–death processes that are constrained by population structure. A principal problem in evolutionary graph theory is to obtain the probability that some initial population of mutants will fixate on a graph, and to determine how that fixation probability depends on the structure of that graph. A fluctuating mutant population on a graph can be considered as a random walk. Martingales exploit symmetry in the steps of a random walk to yield exact analytical expressions for fixation probabilities. They do not require simplifying assumptions such as large population sizes or weak selection. In this paper, we show how martingales can be used to obtain fixation probabilities for symmetric evolutionary graphs. We obtain simpler expressions for the fixation probabilities of star graphs and complete bipartite graphs than have been previously reported and show that these graphs do not amplify selection for advantageous mutations under all conditions.
Collapse
Affiliation(s)
- T. Monk
- Department of Zoology, University of Otago, 340 Great King St., Dunedin 9054, New Zealand
| | - P. Green
- Landcare Research, 764 Cumberland Street, Dunedin 9016, New Zealand
| | - M. Paulin
- Department of Zoology, University of Otago, 340 Great King St., Dunedin 9054, New Zealand
| |
Collapse
|
17
|
Cornuet JM, Pudlo P, Veyssier J, Dehne-Garcia A, Gautier M, Leblois R, Marin JM, Estoup A. DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data. ACTA ACUST UNITED AC 2014; 30:1187-1189. [PMID: 24389659 DOI: 10.1093/bioinformatics/btt763] [Citation(s) in RCA: 640] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 12/25/2013] [Indexed: 12/30/2022]
Abstract
MOTIVATION DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows (i) the analysis of single nucleotide polymorphism data at large number of loci, apart from microsatellite and DNA sequence data, (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. AVAILABILITY Freely available with a detailed notice document and example projects to academic users at http://www1.montpellier.inra.fr/CBGP/diyabc CONTACT: estoup@supagro.inra.fr Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jean-Marie Cornuet
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Pierre Pudlo
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Julien Veyssier
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Alexandre Dehne-Garcia
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Mathieu Gautier
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Raphaël Leblois
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Jean-Michel Marin
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| | - Arnaud Estoup
- Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France Inra, UMR1062 cbgp, Montpellier, France, Université Montpellier 2, UMR CNRS 5149, I3M, Montpellier, France, Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France and CNRS-UM2, Institut de Biologie Computationnelle, LIRMM, Montpellier, France
| |
Collapse
|
18
|
Monteiro NM, Vieira MN, Lyons DO. Operational sex ratio, reproductive costs, and the potential for intrasexual competition. Biol J Linn Soc Lond 2013. [DOI: 10.1111/bij.12126] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
| | - Maria N. Vieira
- Departamento de Biologia; Faculdade de Ciências da Universidade do Porto; rua do Campo Alegre; 4169-007; Porto; Portugal
| | - David O. Lyons
- National Parks & Wildlife Service; Custom House, Druid Lane, Flood Street; Galway; Ireland
| |
Collapse
|
19
|
Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 2012; 193:347-65. [PMID: 23222650 DOI: 10.1534/genetics.112.147983] [Citation(s) in RCA: 254] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Collapse
|
20
|
Manhart M, Haldane A, Morozov AV. A universal scaling law determines time reversibility and steady state of substitutions under selection. Theor Popul Biol 2012; 82:66-76. [PMID: 22838027 PMCID: PMC3613437 DOI: 10.1016/j.tpb.2012.03.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Monomorphic loci evolve through a series of substitutions on a fitness landscape. Understanding how mutation, selection, and genetic drift drive this process, and uncovering the structure of the fitness landscape from genomic data are two major goals of evolutionary theory. Population genetics models of the substitution process have traditionally focused on the weak-selection regime, which is accurately described by diffusion theory. Predictions in this regime can be considered universal in the sense that many population models exhibit equivalent behavior in the diffusion limit. However, a growing number of experimental studies suggest that strong selection plays a key role in some systems, and thus there is a need to understand universal properties of models without a priori assumptions about selection strength. Here we study time reversibility in a general substitution model of a monomorphic haploid population. We show that for any time-reversible population model, such as the Moran process, substitution rates obey an exact scaling law. For several other irreversible models, such as the simple Wright–Fisher process and its extensions, the scaling law is accurate up to selection strengths that are well outside the diffusion regime. Time reversibility gives rise to a power-law expression for the steady-state distribution of populations on an arbitrary fitness landscape. The steady-state behavior is dominated by weak selection and is thus adequately described by the diffusion approximation, which guarantees universality of the steady-state formula and its applicability to the problem of reconstructing fitness landscapes from DNA or protein sequence data.
Collapse
Affiliation(s)
- Michael Manhart
- Department of Physics and Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ USA 08854
| | - Allan Haldane
- Department of Physics and Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ USA 08854
| | - Alexandre V. Morozov
- Department of Physics and Astronomy, Rutgers University, 136 Frelinghuysen Road, Piscataway, NJ USA 08854
- BioMaPS Institute for Quantitative Biology, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ USA 08854
| |
Collapse
|
21
|
Houchmandzadeh B, Vallade M. Alternative to the diffusion equation in population genetics. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:051913. [PMID: 21230506 DOI: 10.1103/physreve.82.051913] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 09/07/2010] [Indexed: 05/27/2023]
Abstract
Since its inception by Kimura in 1955 [M. Kimura, Proc. Natl. Acad. Sci. U.S.A. 41, 144 (1955)], the diffusion equation has become a standard technique of population genetics. The diffusion equation is however only an approximation, valid in the limit of large populations and small selection. Moreover, useful quantities such as the fixation probabilities are not easily extracted from it and need the concomitant use of a forward and backward equation. We show here that the partial differential equation governing the probability generating function can be used as an alternative to the diffusion equation with none of its drawbacks: it does not involve any approximation, it has well-defined initial and boundary conditions, and its solutions are finite polynomials. We apply this technique to derive analytical results for the Moran process with selection, which encompasses the Kimura diffusion equation.
Collapse
Affiliation(s)
- Bahram Houchmandzadeh
- Laboratoire de Spectrométrie Physique, CNRS and Grenoble Université, BP 87, 38402 St. Martin d'Hères Cedex, France
| | | |
Collapse
|
22
|
Affiliation(s)
- Jacques Ninio
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, UMR 8550 of the CNRS, UPMC Université Paris 06 and Université Paris Diderot, Paris, France.
| |
Collapse
|
23
|
Parsons TL, Quince C, Plotkin JB. Some consequences of demographic stochasticity in population genetics. Genetics 2010; 185:1345-54. [PMID: 20457879 PMCID: PMC2927761 DOI: 10.1534/genetics.110.115030] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 05/05/2010] [Indexed: 11/18/2022] Open
Abstract
Much of population genetics is based on the diffusion limit of the Wright-Fisher model, which assumes a fixed population size. This assumption is violated in most natural populations, particularly for microbes. Here we study a more realistic model that decouples birth and death events and allows for a stochastically varying population size. Under this model, classical quantities such as the probability of and time before fixation of a mutant allele can differ dramatically from their Wright-Fisher expectations. Moreover, inferences about natural selection based on Wright-Fisher assumptions can yield erroneous and even contradictory conclusions: at small population densities one allele will appear superior, whereas at large densities the other allele will dominate. Consequently, competition assays in laboratory conditions may not reflect the outcome of long-term evolution in the field. These results highlight the importance of incorporating demographic stochasticity into basic models of population genetics.
Collapse
Affiliation(s)
- Todd L. Parsons
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 and Department of Civil Engineering, University of Glasgow, Glasgow G12 8LT, United Kingdom
| | - Christopher Quince
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 and Department of Civil Engineering, University of Glasgow, Glasgow G12 8LT, United Kingdom
| | - Joshua B. Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 and Department of Civil Engineering, University of Glasgow, Glasgow G12 8LT, United Kingdom
| |
Collapse
|
24
|
|
25
|
Drummond CS, Xue HJ, Yoder JB, Pellmyr O. Host-associated divergence and incipient speciation in the yucca moth Prodoxus coloradensis (Lepidoptera: Prodoxidae) on three species of host plants. Heredity (Edinb) 2009; 105:183-96. [PMID: 20010961 DOI: 10.1038/hdy.2009.154] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
A wide range of evolutionary processes have been implicated in the diversification of yuccas and yucca moths, which exhibit ecological relationships that extend from obligate plant-pollinator mutualisms to commensalist herbivory. Prodoxus coloradensis (Lepidoptera: Prodoxidae) is a yucca moth, which feeds on the flowering stalks of three Yucca species as larvae, but does not provide pollination service. To test for evidence of host-associated speciation, we examined the genetic structure of P. coloradensis using mitochondrial (cytochrome oxidase I) and nuclear (elongation factor 1 alpha) DNA sequence data. Multilocus coalescent simulations indicate that moths on different host plant species are characterized by recent divergence and low levels of effective migration, with large effective population sizes and considerable retention of shared ancestral polymorphism. Although geographical distance explains a proportion of the mitochondrial and nuclear DNA variation among moths on different species of Yucca, the effect of host specificity on genetic distance remains significant after accounting for spatial isolation. The results of this study indicate that differentiation within P. coloradensis is consistent with the evolution of incipient species affiliated with different host plants, potentially influenced by sex-biased dispersal and female philopatry.
Collapse
Affiliation(s)
- C S Drummond
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA.
| | | | | | | |
Collapse
|
26
|
Bloomquist EW, Suchard MA. Unifying vertical and nonvertical evolution: a stochastic ARG-based framework. Syst Biol 2009; 59:27-41. [PMID: 20525618 DOI: 10.1093/sysbio/syp076] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortment, through collections of Markov-dependent gene trees. These tree collections allow for inference of nonvertical evolution, but only indirectly, making findings difficult to interpret and models difficult to generalize. An alternative approach to explore nonvertical evolution relies on phylogenetic networks. These networks provide a framework to model nonvertical evolution but leave unanswered questions such as the statistical significance of specific nonvertical events. In this paper, we begin to correct the shortcomings of both approaches by introducing the "stochastic model for reassortment and transfer events" (SMARTIE) drawing upon ancestral recombination graphs (ARGs). ARGs are directed graphs that allow for formal probabilistic inference on vertical speciation events and nonvertical evolutionary events. We apply SMARTIE to phylogenetic data. Because of this, we can typically infer a single most probable ARG, avoiding coarse population dynamic summary statistics. In addition, a focus on phylogenetic data suggests novel probability distributions on ARGs. To make inference with our model, we develop a reversible jump Markov chain Monte Carlo sampler to approximate the posterior distribution of SMARTIE. Using the BEAST phylogenetic software as a foundation, the sampler employs a parallel computing approach that allows for inference on large-scale data sets. To demonstrate SMARTIE, we explore 2 separate phylogenetic applications, one involving pathogenic Leptospirochete and the other Saccharomyces.
Collapse
Affiliation(s)
- Erik W Bloomquist
- Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA 90095, USA
| | | |
Collapse
|
27
|
Buzbas EO, Joyce P. Maximum likelihood estimates under k-allele models with selection can be numerically unstable. Ann Appl Stat 2009. [DOI: 10.1214/09-aoas237] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
Carvajal-Rodríguez A. Simulation of genomes: a review. Curr Genomics 2008; 9:155-9. [PMID: 19440512 PMCID: PMC2679650 DOI: 10.2174/138920208784340759] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2008] [Revised: 03/18/2008] [Accepted: 03/26/2008] [Indexed: 11/22/2022] Open
Abstract
There is an increasing role of population genetics in human genetic research linking empirical observations with hypotheses about sequence variation due to historical and evolutionary causes. In addition, the data sets are increasing in size, with genome-wide data becoming a common place in many empirical studies. As far as more information is available, it becomes clear that simplest hypotheses are not consistent with data. Simulations will provide the key tool to contrast complex hypotheses on real data by generating simulated data under the hypothetical historical and evolutionary conditions that we want to contrast. Undoubtedly, developing tools for simulating large sequences that at the same time allow simulate natural selection, recombination and complex demography patterns will be of great interest in order to better understanding the trace left on the DNA by different interacting evolutionary forces. Simulation tools will be also essential to evaluate the sampling properties of any statistics used on genome-wide association studies and to compare performance of methods applied at genome-wide scales. Several recent simulation tools have been developed. Here, we review some of the currently existing simulators which allow for efficient simulation of large sequences on complex evolutionary scenarios. In addition, we will point out future directions in this field which are already a key part of the current research in evolutionary biology and it seems that it will be a primary tool in the future research of genome and post-genomic biology.
Collapse
|
29
|
Carvajal-Rodríguez A. GENOMEPOP: a program to simulate genomes in populations. BMC Bioinformatics 2008; 9:223. [PMID: 18447924 PMCID: PMC2386491 DOI: 10.1186/1471-2105-9-223] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 04/30/2008] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation. RESULTS I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios. CONCLUSION GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].
Collapse
|
30
|
Lawton-Rauh A. Demographic processes shaping genetic variation. CURRENT OPINION IN PLANT BIOLOGY 2008; 11:103-109. [PMID: 18353707 DOI: 10.1016/j.pbi.2008.02.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 02/14/2008] [Accepted: 02/15/2008] [Indexed: 05/26/2023]
Abstract
Demographic processes modulate genome-wide levels and patterns of genetic variation via impacting effective population size independently of natural selection. Such processes include the perturbation of population distributions from external events shaping habitat landscape and internal factors shaping the probability of contemporaneous alleles in a population (coalescence). Several patterns have recently emerged: spatial and temporal heterogeneity in population structure have different influences on the persistence of new mutations and genetic variation, multi-locus analyses indicate that gene flow continues to occur during speciation and the incorporation of demographic processes into models of molecular evolution and association genetics approaches has improved statistical power to detect deviations from neutral-equilibrium expectations and decreased false positive rates.
Collapse
Affiliation(s)
- Amy Lawton-Rauh
- Department of Genetics and Biochemistry, Clemson University, 100 Jordan Hall, Clemson, SC 29634-0318, USA.
| |
Collapse
|
31
|
Blythe RA. The propagation of a cultural or biological trait by neutral genetic drift in a subdivided population. Theor Popul Biol 2007; 71:454-72. [PMID: 17337025 DOI: 10.1016/j.tpb.2007.01.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2006] [Revised: 01/17/2007] [Accepted: 01/17/2007] [Indexed: 11/26/2022]
Abstract
We study fixation probabilities and times as a consequence of neutral genetic drift in subdivided populations, motivated by a model of the cultural evolutionary process of language change that is described by the same mathematics as the biological process. We focus on the growth of fixation times with the number of subpopulations, and variation of fixation probabilities and times with initial distributions of mutants. A general formula for the fixation probability for arbitrary initial condition is derived by extending a duality relation between forwards- and backwards-time properties of the model from a panmictic to a subdivided population. From this we obtain new formulae(formally exact in the limit of extremely weak migration) for the mean fixation time from an arbitrary initial condition for Wright's island model, presenting two cases as examples. For more general models of population subdivision, formulae are introduced for an arbitrary number of mutants that are randomly located, and a single mutant whose position is known. These formulae contain parameters that typically have to be obtained numerically, a procedure we follow for two contrasting clustered models. These data suggest that variation of fixation time with the initial condition is slight, but depends strongly on the nature of subdivision. In particular, we demonstrate conditions under which the fixation time remains finite even in the limit of an infinite number of demes. In many cases-except this last where fixation in a finite time is seen--the time to fixation is shown to be in precise agreement with predictions from formulae for the asymptotic effective population size.
Collapse
Affiliation(s)
- R A Blythe
- School of Physics, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, UK.
| |
Collapse
|
32
|
Peng B, Amos CI, Kimmel M. Forward-time simulations of human populations with complex diseases. PLoS Genet 2007; 3:e47. [PMID: 17381243 PMCID: PMC1829403 DOI: 10.1371/journal.pgen.0030047] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Accepted: 02/15/2007] [Indexed: 11/30/2022] Open
Abstract
Due to the increasing power of personal computers, as well as the availability of flexible forward-time simulation programs like simuPOP, it is now possible to simulate the evolution of complex human diseases using a forward-time approach. This approach is potentially more powerful than the coalescent approach since it allows simulations of more than one disease susceptibility locus using almost arbitrary genetic and demographic models. However, the application of such simulations has been deterred by the lack of a suitable simulation framework. For example, it is not clear when and how to introduce disease mutants—especially those under purifying selection—to an evolving population, and how to control the disease allele frequencies at the last generation. In this paper, we introduce a forward-time simulation framework that allows us to generate large multi-generation populations with complex diseases caused by unlinked disease susceptibility loci, according to specified demographic and evolutionary properties. Unrelated individuals, small or large pedigrees can be drawn from the resulting population and provide samples for a wide range of study designs and ascertainment methods. We demonstrate our simulation framework using three examples that map genes associated with affection status, a quantitative trait, and the age of onset of a hypothetical cancer, respectively. Nonadditive fitness models, population structure, and gene–gene interactions are simulated. Case-control, sibpair, and large pedigree samples are drawn from the simulated populations and are examined by a variety of gene-mapping methods. Complex diseases such as hypertension and diabetes are usually caused by multiple disease-susceptibility genes, environment factors, and interactions between them. Simulating populations or samples with complex diseases is an effective approach to study the likely genetic architecture of these diseases and to develop more effective gene-mapping methods. Compared to traditional backward-time (coalescent) methods, population-based, forward-time simulations are more suitable for this task because they can simulate almost arbitrary demographic and genetic features. Forward-time simulations also allow the researcher to perform head-to-head comparisons among gene-mapping methods based on different study designs and ascertainment methods. Unfortunately, evolving a population generation by generation is a random process, so the fates of disease alleles are unpredictable and there is no effective way to control the disease allele frequency at the present generation. In this paper, the authors propose a simulation method that avoids these problems and makes forward-time population simulation a practical solution for the simulation of complex diseases.
Collapse
Affiliation(s)
- Bo Peng
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas, United States of America.
| | | | | |
Collapse
|