452
|
Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, Knowles L, Estoup A, Panchal M, Corander J, Hickerson M, Sisson SA, Fagundes N, Chikhi L, Beerli P, Vitalis R, Cornuet JM, Huelsenbeck J, Foll M, Yang Z, Rousset F, Balding D, Excoffier L. In defence of model-based inference in phylogeography. Mol Ecol 2010; 19:436-446. [PMID: 29284924 DOI: 10.1111/j.1365-294x.2009.04515.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.
Collapse
Affiliation(s)
- Mark A Beaumont
- School of Animal and Microbial Sciences, University of Reading, Whiteknights, PO Box 228, Reading, RG6 6AJ, UK
| | - Rasmus Nielsen
- Integrative Biology, UC Berkeley, 3060 Valley Life Sciences Bldg #3140, Berkeley, CA 94720-3140, USA
| | | | - Jody Hey
- Department of Genetics, Rutgers University, 604 Allison Road, Piscataway, NJ 08854, USA
| | - Oscar Gaggiotti
- Laboratoire d'Ecologie Alpine, UMR CNRS 5553, Université Joseph Fourier, BP 53, 38041 GRENOBLE, France
| | - Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| | - Arnaud Estoup
- INRA UMR Centre de Biologie et de Gestion des Populations (INRA ⁄ IRD ⁄ Cirad ⁄ Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France
| | - Mahesh Panchal
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Jukka Corander
- Department of Mathematics and statistics, University of Helsinki, Finland
| | - Mike Hickerson
- Biology Department, Queens College, City University of New York, 65-30 Kissena Boulevard, Flushing, NY 11367-1597, USA
| | - Scott A Sisson
- School of Mathematics and Statistics, University of New South Wales, Sydney, Australia
| | - Nelson Fagundes
- Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Lounès Chikhi
- Université Paul Sabatier-UMR EDB 5174 118, 31062 Toulouse Cedex 09, France
| | - Peter Beerli
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | - Renaud Vitalis
- CNRS-INRA, CBGP, Campus International de Baillarguet, CS 30016, 34988 Montferrier-sur-Lez, France
| | - Jean-Marie Cornuet
- INRA UMR Centre de Biologie et de Gestion des Populations (INRA ⁄ IRD ⁄ Cirad ⁄ Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France
| | - John Huelsenbeck
- Integrative Biology, UC Berkeley, 3060 Valley Life Sciences Bldg #3140, Berkeley, CA 94720-3140, USA
| | - Matthieu Foll
- CMPG, Institute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Ziheng Yang
- Department of Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Francois Rousset
- Institut des Sciences de l'Évolution, Universté Montpellier 2, CNRS, Place Eugène Bataillon, CC065, Montpellier, Cedex 5, France
| | - David Balding
- Institute of Genetics, University College London, 2nd Floor, Kathleen Lonsdale Building, 5 Gower Place, London WC1E 6BT, UK
| | - Laurent Excoffier
- CMPG, Institute of Ecology and Evolution, University of Berne, 3012 Berne, Switzerland.,Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
456
|
ABC: a useful Bayesian tool for the analysis of population data. INFECTION GENETICS AND EVOLUTION 2009; 10:826-33. [PMID: 19879976 DOI: 10.1016/j.meegid.2009.10.010] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2009] [Revised: 10/20/2009] [Accepted: 10/21/2009] [Indexed: 11/20/2022]
Abstract
Approximate Bayesian computation (ABC) is a recently developed technique for solving problems in Bayesian inference. Although typically less accurate than, for example, the frequently used Markov Chain Monte Carlo (MCMC) methods, they have greater flexibility because they do not require the specification of a likelihood function. For this reason considerable amounts of data can be analysed and more complex models can be used providing, thereby, a potential better fit of the model to the data. Since its first applications in the late 1990s its usage has been steadily increasing. The framework was originally developed to solve problems in population genetics. However, as its efficiency was recognized its popularity increased and, consequently, it started to be used in fields as diverse as phylogenetics, ecology, conservation, molecular evolution and epidemiology. While the ABC algorithm is still being greatly studied and alterations to it are being proposed, the statistical approach has already reached a level of maturity well demonstrated by the number of related computer packages that are being developed. As improved ABC algorithms are proposed, the expansion of the use of this method can only increase. In this paper we are going to depict the context that led to the development of ABC focusing on the field of infectious disease epidemiology. We are then going to describe its current usage in such field and present its most recent developments.
Collapse
|
457
|
Abstract
The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only the potential of various approaches but also methodological pitfalls.
Collapse
Affiliation(s)
- Gilles Guillot
- Department of Informatics and Mathematical Modelling, Technical University of Denmark, Copenhagen, Denmark.
| | | | | | | |
Collapse
|
459
|
Lopes JS, Balding D, Beaumont MA. PopABC: a program to infer historical demographic parameters. ACTA ACUST UNITED AC 2009; 25:2747-9. [PMID: 19679678 DOI: 10.1093/bioinformatics/btp487] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED PopABC is a computer package for inferring the pattern of demographic divergence of closely related populations and species. The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available at http://www.reading.ac.uk/ approximately sar05sal/software.htm. The program was implemented in C and can run on UNIX, MacOSX and Windows operating systems.
Collapse
Affiliation(s)
- Joao S Lopes
- School of Biological Sciences, University of Reading, Whiteknights, PO Box 228, Reading RG66AJ, UK.
| | | | | |
Collapse
|
461
|
Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 2009; 182:1207-18. [PMID: 19506307 DOI: 10.1534/genetics.109.102509] [Citation(s) in RCA: 199] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Approximate Bayesian computation (ABC) techniques permit inferences in complex demographic models, but are computationally inefficient. A Markov chain Monte Carlo (MCMC) approach has been proposed (Marjoram et al. 2003), but it suffers from computational problems and poor mixing. We propose several methodological developments to overcome the shortcomings of this MCMC approach and hence realize substantial computational advances over standard ABC. The principal idea is to relax the tolerance within MCMC to permit good mixing, but retain a good approximation to the posterior by a combination of subsampling the output and regression adjustment. We also propose to use a partial least-squares (PLS) transformation to choose informative statistics. The accuracy of our approach is examined in the case of the divergence of two populations with and without migration. In that case, our ABC-MCMC approach needs considerably lower computation time to reach the same accuracy than conventional ABC. We then apply our method to a more complex case with the estimation of divergence times and migration rates between three African populations.
Collapse
|
462
|
Davison D, Pritchard JK, Coop G. An approximate likelihood for genetic data under a model with recombination and population splitting. Theor Popul Biol 2009; 75:331-45. [PMID: 19362099 PMCID: PMC3108256 DOI: 10.1016/j.tpb.2009.04.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Revised: 03/26/2009] [Accepted: 04/02/2009] [Indexed: 10/20/2022]
Abstract
We describe a new approximate likelihood for population genetic data under a model in which a single ancestral population has split into two daughter populations. The approximate likelihood is based on the 'Product of Approximate Conditionals' likelihood and 'copying model' of Li and Stephens [Li, N., Stephens, M., 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165 (4), 2213-2233]. The approach developed here may be used for efficient approximate likelihood-based analyses of unlinked data. However our copying model also considers the effects of recombination. Hence, a more important application is to loosely-linked haplotype data, for which efficient statistical models explicitly featuring non-equilibrium population structure have so far been unavailable. Thus, in addition to the information in allele frequency differences about the timing of the population split, the method can also extract information from the lengths of haplotypes shared between the populations. There are a number of challenges posed by extracting such information, which makes parameter estimation difficult. We discuss how the approach could be extended to identify haplotypes introduced by migrants.
Collapse
Affiliation(s)
- D Davison
- Committee on Evolutionary Biology, University of Chicago, USA.
| | | | | |
Collapse
|
463
|
Origins and genetic diversity of pygmy hunter-gatherers from Western Central Africa. Curr Biol 2009; 19:312-8. [PMID: 19200724 DOI: 10.1016/j.cub.2008.12.049] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Revised: 12/23/2008] [Accepted: 12/24/2008] [Indexed: 11/27/2022]
Abstract
Central Africa is currently peopled by numerous sedentary agriculturalist populations neighboring the largest group of mobile hunter-gatherers, the Pygmies [1-3]. Although archeological remains attest to Homo sapiens' presence in the Congo Basin for at least 30,000 years, the demographic history of these groups, including divergence and admixture, remains widely unknown [4-6]. Moreover, it is still debated whether common history or convergent adaptation to a forest environment resulted in the short stature characterizing the pygmies [2, 7]. We genotyped 604 individuals at 28 autosomal tetranucleotide microsatellite loci in 12 nonpygmy and 9 neighboring pygmy populations. We found a high level of genetic heterogeneity among Western Central African pygmies, as well as evidence of heterogeneous levels of asymmetrical gene flow from nonpygmies to pygmies, consistent with the variable sociocultural barriers against intermarriages. Using approximate Bayesian computation (ABC) methods [8], we compared several historical scenarios. The most likely points toward a unique ancestral pygmy population that diversified approximately 2800 years ago, contemporarily with the Neolithic expansion of nonpygmy agriculturalists [9, 10]. Our results show that recent isolation, genetic drift, and heterogeneous admixture enabled a rapid and substantial genetic differentiation among Western Central African pygmies. Such an admixture pattern is consistent with the various sociocultural behaviors related to intermariages between pygmies and nonpygmies.
Collapse
|