1
|
Fenton EF, Rice DP, Novembre J, Desai MM. Detecting deviations from Kingman coalescence using 2-site frequency spectra. Genetics 2025; 229:iyaf023. [PMID: 39919046 PMCID: PMC12005255 DOI: 10.1093/genetics/iyaf023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025] Open
Abstract
Demographic inference methods in population genetics typically assume that the ancestry of a sample can be modeled by the Kingman coalescent. A defining feature of this stochastic process is that it generates genealogies that are binary trees: no more than 2 ancestral lineages may coalesce at the same time. However, this assumption breaks down under several scenarios. For example, pervasive natural selection and extreme variation in offspring number can both generate genealogies with "multiple-merger" events in which more than 2 lineages coalesce instantaneously. Therefore, detecting violations of the Kingman assumptions (e.g. due to multiple mergers) is important both for understanding which forces have shaped the diversity of a population and for avoiding fitting misspecified models to data. Current methods to detect deviations from Kingman coalescence in genomic data rely primarily on the site frequency spectrum (SFS). However, the signatures of some non-Kingman processes (e.g. multiple mergers) in the SFS are also consistent with a Kingman coalescent with a time-varying population size. Here, we present a new statistical test for determining whether the Kingman coalescent with any population size history is consistent with population data. Our approach is based on information contained in the 2-site joint frequency spectrum (2-SFS) for pairs of linked sites, which has a different dependence on the topologies of genealogies than the SFS. Our statistical test is global in the sense that it can detect when the genome-wide genetic diversity is inconsistent with the Kingman model, rather than detecting outlier regions, as in selection scan methods. We validate this test using simulations and then apply it to demonstrate that genomic diversity data from Drosophila melanogaster is inconsistent with the Kingman coalescent.
Collapse
Affiliation(s)
- Eliot F Fenton
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
| | - Daniel P Rice
- Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- SecureBio, Cambridge, MA 02138, USA
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Department of Ecology & Evolution, University of Chicago, Chicago, IL 60637, USA
| | - Michael M Desai
- Department of Physics, Harvard University, Cambridge, MA 02138, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
2
|
Avalos-Pacheco A, Cronjäger MC, Jenkins PA, Hein J. An almost infinite sites model. Theor Popul Biol 2024; 160:49-61. [PMID: 39454763 DOI: 10.1016/j.tpb.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 09/11/2024] [Accepted: 10/11/2024] [Indexed: 10/28/2024]
Abstract
MOTIVATION A main challenge in molecular evolution is to find computationally efficient mutation models with flexible assumptions that properly reflect genetic variation. The infinite sites model assumes that each mutation event occurs at a site never previously mutant, i.e. it does not allow recurrent mutations. This is reasonable for low mutation rates and makes statistical inference much more tractable. However, recurrent mutations are common enough to be observable from genetic variation data, even in species with low per-site mutation rates such as humans. The finite sites model on the other hand allows for recurrent mutations but is computationally unfeasible to work with in most cases. In this work, we bridge these two approaches by developing a novel molecular evolution model, the almost infinite sites model, that both admits recurrent mutations and is tractable. We provide a recursive characterization of the likelihood of our proposed model under complete linkage and outline a parsimonious approximation scheme for computing it. RESULTS We show the usefulness of our model in simulated and human mitochondrial data. Our results show that the AISM, in combination with a constraint on the total number of mutation events, can recover accurate approximations to the maximum likelihood estimator of the mutation rate. AVAILABILITY AND IMPLEMENTATION An implementation of our model is freely available along with code for reproducing our computational experiments at https://github.com/Cronjaeger/almost-infinite-sites-recursions.
Collapse
Affiliation(s)
- Alejandra Avalos-Pacheco
- Institute of Applied Statistics, Johannes Kepler University Linz, 4040 Linz, Austria; Harvard-MIT Center for Regulatory Science, Harvard University, 210 Longwood Ave, Boston, MA 02155, United States of America
| | - Mathias C Cronjäger
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford OX1 3LB, United Kingdom; Novo Nordisk, 2880 Bagsværd, Denmark
| | - Paul A Jenkins
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, United Kingdom; Department of Computer Science, University of Warwick, Coventry, CV4 7AL, United Kingdom; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, United Kingdom
| | - Jotun Hein
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford OX1 3LB, United Kingdom.
| |
Collapse
|
3
|
Ramos-Onsins SE, Marmorini G, Achaz G, Ferretti L. A General Framework for Neutrality Tests Based on the Site Frequency Spectrum. Genes (Basel) 2023; 14:1714. [PMID: 37761854 PMCID: PMC10531300 DOI: 10.3390/genes14091714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/25/2023] [Accepted: 08/26/2023] [Indexed: 09/29/2023] Open
Abstract
One of the main necessities for population geneticists is the availability of sensitive statistical tools that enable to accept or reject the standard Wright-Fisher model of neutral evolution. A number of statistical tests have been developed to detect specific deviations from the null frequency spectrum in different directions (e.g., Tajima's D, Fu and Li's F and D tests, Fay and Wu's H). A general framework exists to generate all neutrality tests that are linear functions of the frequency spectrum. In this framework, it is possible to develop a family of optimal tests with almost maximum power against a specific alternative evolutionary scenario. In this paper we provide a thorough discussion of the structure and properties of linear and nonlinear neutrality tests. First, we present the general framework for linear tests and emphasise the importance of the property of scalability with the sample size (that is, the interpretation of the tests should not depend on the sample size), which, if missing, can lead to errors in interpreting the data. After summarising the motivation and structure of linear optimal tests, we present a more general framework for the optimisation of linear tests, leading to a new family of tunable neutrality tests. In a further generalisation, we extend the framework to nonlinear neutrality tests and we derive nonlinear optimal tests for polynomials of any degree in the frequency spectrum.
Collapse
Affiliation(s)
| | - Giacomo Marmorini
- Department of Physics and Mathematics, Aoyama Gakuin University, Sagamihara 252-5258, Japan;
- Department of Physics, Nihon University, Tokyo 156-8550, Japan
| | - Guillaume Achaz
- Institut de Systématique, Evolution, Biodiversité, UMR 7205, MNHN and Centre Interdisciplinaire de Recherche en Biologie, UMR 7241, Collége de France, 75231 Paris, France;
| | - Luca Ferretti
- Pandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3AZ, UK
| |
Collapse
|
4
|
Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci U S A 2021; 118:2026746118. [PMID: 34413189 DOI: 10.1073/pnas.2026746118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Range expansions accelerate evolution through multiple mechanisms, including gene surfing and genetic drift. The inference and control of these evolutionary processes ultimately rely on the information contained in genealogical trees. Currently, there are two opposing views on how range expansions shape genealogies. In invasion biology, expansions are typically approximated by a series of population bottlenecks producing genealogies with only pairwise mergers between lineages-a process known as the Kingman coalescent. Conversely, traveling wave models predict a coalescent with multiple mergers, known as the Bolthausen-Sznitman coalescent. Here, we unify these two approaches and show that expansions can generate an entire spectrum of coalescent topologies. Specifically, we show that tree topology is controlled by growth dynamics at the front and exhibits large differences between pulled and pushed expansions. These differences are explained by the fluctuations in the total number of descendants left by the early founders. High growth cooperativity leads to a narrow distribution of reproductive values and the Kingman coalescent. Conversely, low growth cooperativity results in a broad distribution, whose exponent controls the merger sizes in the genealogies. These broad distribution and non-Kingman tree topologies emerge due to the fluctuations in the front shape and position and do not occur in quasi-deterministic simulations. Overall, our results show that range expansions provide a robust mechanism for generating different types of multiple mergers, which could be similar to those observed in populations with strong selection or high fecundity. Thus, caution should be exercised in making inferences about the origin of non-Kingman genealogies.
Collapse
|
6
|
Franssen SU, Durrant C, Stark O, Moser B, Downing T, Imamura H, Dujardin JC, Sanders MJ, Mauricio I, Miles MA, Schnur LF, Jaffe CL, Nasereddin A, Schallig H, Yeo M, Bhattacharyya T, Alam MZ, Berriman M, Wirth T, Schönian G, Cotton JA. Global genome diversity of the Leishmania donovani complex. eLife 2020; 9:e51243. [PMID: 32209228 PMCID: PMC7105377 DOI: 10.7554/elife.51243] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 02/27/2020] [Indexed: 12/30/2022] Open
Abstract
Protozoan parasites of the Leishmania donovani complex - L. donovani and L. infantum - cause the fatal disease visceral leishmaniasis. We present the first comprehensive genome-wide global study, with 151 cultured field isolates representing most of the geographical distribution. L. donovani isolates separated into five groups that largely coincide with geographical origin but vary greatly in diversity. In contrast, the majority of L. infantum samples fell into one globally-distributed group with little diversity. This picture is complicated by several hybrid lineages. Identified genetic groups vary in heterozygosity and levels of linkage, suggesting different recombination histories. We characterise chromosome-specific patterns of aneuploidy and identified extensive structural variation, including known and suspected drug resistance loci. This study reveals greater genetic diversity than suggested by geographically-focused studies, provides a resource of genomic variation for future work and sets the scene for a new understanding of the evolution and genetics of the Leishmania donovani complex.
Collapse
Affiliation(s)
| | - Caroline Durrant
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | | | | | - Tim Downing
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
- Dublin City UniversityDublinIreland
| | | | - Jean-Claude Dujardin
- Institute of Tropical MedicineAntwerpBelgium
- Department of Biomedical Sciences, University of AntwerpAntwerpBelgium
| | - Mandy J Sanders
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Isabel Mauricio
- Universidade Nova de Lisboa Instituto de Higiene e MedicinaLisboaPortugal
| | - Michael A Miles
- London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Lionel F Schnur
- Kuvin Centre for the Study of Infectious and Tropical Diseases, IMRIC, Hebrew University-Hadassah, Medical SchoolJerusalemIsrael
| | - Charles L Jaffe
- Kuvin Centre for the Study of Infectious and Tropical Diseases, IMRIC, Hebrew University-Hadassah, Medical SchoolJerusalemIsrael
| | - Abdelmajeed Nasereddin
- Kuvin Centre for the Study of Infectious and Tropical Diseases, IMRIC, Hebrew University-Hadassah, Medical SchoolJerusalemIsrael
| | - Henk Schallig
- Amsterdam University Medical Centres – Academic Medical Centre at the University of Amsterdam, Department of Medical Microbiology – Experimental ParasitologyAmsterdamNetherlands
| | - Matthew Yeo
- London School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | | | - Mohammad Z Alam
- Department of Parasitology, Bangladesh Agricultural UniversityMymensinghBangladesh
| | - Matthew Berriman
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Thierry Wirth
- Institut de Systématique, Evolution, Biodiversité, ISYEB, Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des AntillesParisFrance
- École Pratique des Hautes Études (EPHE)Paris Sciences & Lettres (PSL)ParisFrance
| | | | - James A Cotton
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| |
Collapse
|
7
|
Giner-Delgado C, Villatoro S, Lerga-Jaso J, Gayà-Vidal M, Oliva M, Castellano D, Pantano L, Bitarello BD, Izquierdo D, Noguera I, Olalde I, Delprat A, Blancher A, Lalueza-Fox C, Esko T, O'Reilly PF, Andrés AM, Ferretti L, Puig M, Cáceres M. Evolutionary and functional impact of common polymorphic inversions in the human genome. Nat Commun 2019; 10:4222. [PMID: 31530810 PMCID: PMC6748972 DOI: 10.1038/s41467-019-12173-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 08/27/2019] [Indexed: 12/21/2022] Open
Abstract
Inversions are one type of structural variants linked to phenotypic differences and adaptation in multiple organisms. However, there is still very little information about polymorphic inversions in the human genome due to the difficulty of their detection. Here, we develop a new high-throughput genotyping method based on probe hybridization and amplification, and we perform a complete study of 45 common human inversions of 0.1–415 kb. Most inversions promoted by homologous recombination occur recurrently in humans and great apes and they are not tagged by SNPs. Furthermore, there is an enrichment of inversions showing signatures of positive or balancing selection, diverse functional effects, such as gene disruption and gene-expression changes, or association with phenotypic traits. Therefore, our results indicate that the genome is more dynamic than previously thought and that human inversions have important functional and evolutionary consequences, making possible to determine for the first time their contribution to complex traits. Inversions are a little-studied type of genomic variation that could contribute to phenotypic traits. Here the authors characterize 45 common polymorphic inversions in human populations and investigate their evolutionary and functional impact.
Collapse
Affiliation(s)
- Carla Giner-Delgado
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain.,Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Sergi Villatoro
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Jon Lerga-Jaso
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Magdalena Gayà-Vidal
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain.,CIBIO/InBIO Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Vairão, Distrito do Porto, 4485-661, Portugal
| | - Meritxell Oliva
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - David Castellano
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Lorena Pantano
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Bárbara D Bitarello
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Saxony, 04103, Germany
| | - David Izquierdo
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Isaac Noguera
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Iñigo Olalde
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Alejandra Delprat
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Antoine Blancher
- Laboratoire d'immunologie, CHU de Toulouse, IFB Hôpital Purpan, Toulouse, 31059, France.,Centre de Physiopathologie Toulouse-Purpan (CPTP), Université de Toulouse, Centre National de la Recherche Scientifique (CNRS), Institut National de la Santé et de la Recherche Médicale (Inserm), Université Paul Sabatier (UPS), Toulouse, 31024, France
| | - Carles Lalueza-Fox
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Tõnu Esko
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, 51010, Estonia
| | - Paul F O'Reilly
- Social, Genetic, and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Aida M Andrés
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Saxony, 04103, Germany.,UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Luca Ferretti
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, OX3 7LF, UK
| | - Marta Puig
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, 08193, Spain. .,ICREA, Barcelona, 08010, Spain.
| |
Collapse
|