1
|
Cappello L, Lo WT‘J, Zhang JZ, Xu P, Barrow D, Chopra I, Clark AG, Wells MT, Kim J. Bayesian phylodynamic inference of population dynamics with dormancy. Proc Natl Acad Sci U S A 2025; 122:e2501394122. [PMID: 40314983 PMCID: PMC12067208 DOI: 10.1073/pnas.2501394122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Accepted: 02/24/2025] [Indexed: 05/03/2025] Open
Abstract
Many organisms employ reversible dormancy, or seedbank, in response to environmental fluctuations. This life-history strategy alters fundamental ecoevolutionary forces, leading to distinct patterns of genetic diversity. Two models of dormancy have been proposed based on the average duration of dormancy relative to coalescent timescales: weak seedbank, induced by scheduled seasonality (e.g., plants, invertebrates), and strong seedbank, where individuals stochastically switch between active and dormant states (e.g., bacteria, fungi). The weak seedbank coalescent is statistically equivalent to the Kingman coalescent with a scaled mutation rate, allowing the use of existing inference methods. In contrast, the strong seedbank coalescent differs fundamentally, as only active lineages can coalesce, while dormant lineages cannot. Additionally, dormant individuals typically mutate at a slower rate than active ones. Consequently, despite the significant role of dormancy in the ecoevolutionary dynamics of many organisms, no methods currently exist for inferring population dynamics involving dormancy and associated parameters. We present a Bayesian framework for jointly inferring a latent genealogy, seedbank parameters, and evolutionary parameters from molecular sequence data under the strong seedbank coalescent. We derive the exact probability density of genealogies sampled under the strong seedbank coalescent, characterize the corresponding likelihood function, and present efficient computational algorithms for its evaluation based on our theoretical framework. We develop a tailored Markov chain Monte Carlo sampler and implement our inference framework as a package SeedbankTree within BEAST2. Our work provides both a theoretical foundation and practical inference framework for studying the population genetic and genealogical impacts of dormancy.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona08005, Spain
- Data Science Center, Barcelona School of Economics, Barcelona08005, Spain
| | - Wai Tung ‘Jack’ Lo
- Department of Computational Biology, Cornell University, Ithaca, NY14850
| | - Joy Z. Zhang
- Center for Applied Mathematics, Cornell University, Ithaca, NY14850
| | - Peiyu Xu
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY14850
| | - Daniel Barrow
- Department of Computational Biology, Cornell University, Ithaca, NY14850
| | - Ishani Chopra
- Department of Computational Biology, Cornell University, Ithaca, NY14850
| | - Andrew G. Clark
- Department of Computational Biology, Cornell University, Ithaca, NY14850
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY14850
| | - Martin T. Wells
- Department of Statistics and Data Science, Cornell University, Ithaca, NY14850
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, NY14850
| |
Collapse
|
2
|
Helekal D, Mortimer TD, Mukherjee A, Gentile G, Le Van A, Nicholas RA, Jerse AE, Palace SG, Grad YH. Quantifying the impact of antibiotic use and genetic determinants of resistance on bacterial lineage dynamics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.03.636319. [PMID: 39975361 PMCID: PMC11838577 DOI: 10.1101/2025.02.03.636319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The dynamics of antimicrobial resistance in bacterial populations are informed by the fitness impact of genetic determinants of resistance and antibiotic pressure. However, estimates of real-world fitness impact have been lacking. To address this gap, we developed a hierarchical Bayesian phylodynamic model to quantify contributions of resistance determinants to strain success in a 20-year collection of Neisseria gonorrhoeae isolates. Fitness contributions varied with antibiotic use, and genetic pathways to phenotypically identical resistance conferred distinct fitness effects. These findings were supported by in vitro and experimental infection competition. Quantifying these fitness contributions to lineage dynamics reveals opportunities for investigation into other genetic and environmental drivers of fitness. This work thus establishes a method for linking pathogen genomics and antibiotic use to define factors shaping ecological trends.
Collapse
Affiliation(s)
- David Helekal
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Tatum D. Mortimer
- Department of Population Health, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA
| | - Aditi Mukherjee
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Gabriella Gentile
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Adriana Le Van
- Department of Microbiology and Immunology, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD 20817, USA
| | - Robert A. Nicholas
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Departments of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ann E. Jerse
- Department of Microbiology and Immunology, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Samantha G. Palace
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yonatan H. Grad
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
3
|
Koelle K, Rasmussen DA. Phylodynamics beyond neutrality: the impact of incomplete purifying selection on viral phylogenies and inference. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230314. [PMID: 39976414 PMCID: PMC11867112 DOI: 10.1098/rstb.2023.0314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 10/07/2024] [Accepted: 11/04/2024] [Indexed: 02/21/2025] Open
Abstract
Viral phylodynamics focuses on using sequence data to make inferences about the population dynamics of viral diseases. These inferences commonly include estimation of growth rates, reproduction numbers and times of most recent common ancestor. With few exceptions, existing phylodynamic inference approaches assume that all observed and ancestral viral genetic variation is fitness-neutral. This assumption is commonly violated, with a large body of analyses indicating that fitness varies substantially among genotypes circulating in viral populations. Here, we focus on fitness variation arising from deleterious mutations, asking whether incomplete purifying selection of deleterious mutations has the potential to bias phylodynamic inference. We use simulations of an exponentially growing population to explore how incomplete purifying selection distorts tree shape and shifts the distribution of mutations over trees. We find that incomplete purifying selection strongly shapes the distribution of mutations while only weakly impacting tree shape. Despite incomplete purifying selection shifting the distribution of deleterious mutations, we find little discernible bias in estimates of viral growth rates and times of the most recent common ancestor. Our results reassuringly indicate that existing phylodynamic inference approaches that assume neutrality may nevertheless yield accurate epidemiological estimates in the face of incomplete purifying selection. More work is needed to assess the robustness of these findings to alternative epidemiological parametrizations.This article is part of the theme issue ''"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Katia Koelle
- Department of Biology, Emory University, Atlanta, GA30322, USA
- Emory Center of Excellence for Influenza Research and Response (CEIRR), Atlanta, GA30322, USA
| | - David A. Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC27607, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC27607, USA
| |
Collapse
|
4
|
Cappello L, ‘Jack’ Lo WT, Zhang JZ, Xu P, Barrow D, Chopra I, Clark AG, Wells MT, Kim J. Bayesian phylodynamic inference of population dynamics with dormancy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.19.633741. [PMID: 39896623 PMCID: PMC11785064 DOI: 10.1101/2025.01.19.633741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Many organisms employ reversible dormancy, or seedbank, in response to environmental fluctuations. This life-history strategy alters fundamental eco-evolutionary forces, leading to distinct patterns of genetic diversity. Two models of dormancy have been proposed based on the average duration of dormancy relative to coalescent timescales: weak seedbank, induced by scheduled seasonality (e.g., plants, invertebrates), and strong seedbank, where individuals stochastically switch between active and dormant states (e.g., bacteria, fungi). The weak seedbank coalescent is statistically equivalent to the Kingman coalescent with a scaled mutation rate, allowing the use of existing inference methods. In contrast, the strong seedbank coalescent differs fundamentally, as only active lineages can coalesce, while dormant lineages cannot. Additionally, dormant individuals typically mutate at a slower rate than active ones. Consequently, despite the significant role of dormancy in the eco-evolutionary dynamics of many organisms, no methods currently exist for inferring population dynamics involving dormancy and associated parameters. We present a Bayesian framework for jointly inferring a latent genealogy, seedbank parameters, and evolutionary parameters from molecular sequence data under the strong seedbank coalescent. We derive the exact probability density of genealogies sampled under the strong seedbank coalescent, characterize the corresponding likelihood function, and present efficient computational algorithms for its evaluation based on our theoretical framework. We develop a tailored Markov chain Monte Carlo sampler and implement our inference framework as a package SeedbankTree within BEAST2. Our work provides both a theoretical foundation and practical inference framework for studying the population genetic and genealogical impacts of dormancy.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain
- Data Science Center, Barcelona School of Economics, Barcelona, Spain
| | - Wai Tung ‘Jack’ Lo
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Joy Z. Zhang
- Center for Applied Mathematics, Cornell University, Ithaca, New York, USA
| | - Peiyu Xu
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, New York, USA
| | - Daniel Barrow
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ishani Chopra
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Andrew G. Clark
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, New York, USA
| | - Martin T. Wells
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
5
|
Lefrancq N, Duret L, Bouchez V, Brisse S, Parkhill J, Salje H. Learning the fitness dynamics of pathogens from phylogenies. Nature 2025; 637:683-690. [PMID: 39743587 PMCID: PMC11735385 DOI: 10.1038/s41586-024-08309-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 10/30/2024] [Indexed: 01/04/2025]
Abstract
The dynamics of the genetic diversity of pathogens, including the emergence of lineages with increased fitness, is a foundational concept of disease ecology with key public-health implications. However, the identification of such lineages and estimation of associated fitness remain challenging, and is rarely done outside densely sampled systems1,2. Here we present phylowave, a scalable approach that summarizes changes in population composition in phylogenetic trees, enabling the automatic detection of lineages based on shared fitness and evolutionary relationships. We use our approach on a broad set of viruses and bacteria (SARS-CoV-2, influenza A subtype H3N2, Bordetella pertussis and Mycobacterium tuberculosis), which include both well-studied and understudied threats to human health. We show that phylowave recovers the main known circulating lineages for each pathogen and that it can detect specific amino acid changes linked to fitness changes. Furthermore, phylowave identifies previously undetected lineages with increased fitness, including three co-circulating B. pertussis lineages. Inference using phylowave is robust to uneven and limited observations. This widely applicable approach provides an avenue to monitor evolution in real time to support public-health action and explore fundamental drivers of pathogen fitness.
Collapse
Affiliation(s)
- Noémie Lefrancq
- Department of Genetics, University of Cambridge, Cambridge, UK.
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
| | - Loréna Duret
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Valérie Bouchez
- Biodiversity and Epidemiology of Bacterial Pathogens, Institut Pasteur, Université de Paris, Paris, France
- National Reference Center for Whooping Cough and Other Bordetella Infections, Paris, France
| | - Sylvain Brisse
- Biodiversity and Epidemiology of Bacterial Pathogens, Institut Pasteur, Université de Paris, Paris, France
- National Reference Center for Whooping Cough and Other Bordetella Infections, Paris, France
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Henrik Salje
- Department of Genetics, University of Cambridge, Cambridge, UK
| |
Collapse
|
6
|
Mendes FK, Landis MJ. PhyloJunction: A Computational Framework for Simulating, Developing, and Teaching Evolutionary Models. Syst Biol 2024; 73:1051-1060. [PMID: 39115380 DOI: 10.1093/sysbio/syae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/20/2024] [Accepted: 08/05/2024] [Indexed: 12/14/2024] Open
Abstract
We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, testing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, thanks to its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This article describes the features of PhyloJunction-which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models-and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.
Collapse
Affiliation(s)
- Fábio K Mendes
- Department of Biology, Louisiana State University, Baton Rouge, LA, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, MO 63130, USA
| |
Collapse
|
7
|
Magalis BR, Riva A, Marini S, Salemi M, Prosperi M. Novel insights on unraveling dynamics of transmission clusters in outbreaks using phylogeny-based methods. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2024; 124:105661. [PMID: 39186995 DOI: 10.1016/j.meegid.2024.105661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/31/2024] [Accepted: 08/21/2024] [Indexed: 08/28/2024]
Abstract
Molecular data analysis is invaluable in understanding the overall behavior of a rapidly spreading virus population when epidemiological surveillance is problematic. It is also particularly beneficial in describing subgroups within the population, often identified as clades within a phylogenetic tree that represent individuals connected via direct transmission or transmission via differing risk factors in viral spread. However, transmission patterns or viral dynamics within these smaller groups should not be expected to exhibit homogeneous behavior over time. As such, standard phylogenetic approaches that identify clusters based on summary statistics would not be expected to capture dynamic clusters of transmission. We, therefore, sought to evaluate the performance of existing and adapted phylogeny-based cluster identification tools on simulated transmission clusters exhibiting dynamic transmission behavior over time. Despite the complementarity of the tools, we provide strong evidence that novel cluster identification methods are needed for reliable detection of epidemiologically linked individuals, particularly those exhibiting changing transmission dynamics during dynamic outbreak scenarios.
Collapse
Affiliation(s)
- Brittany Rife Magalis
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, United States of America.
| | - Alberto Riva
- Bioinformatics Core, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL 32610, United States of America
| | - Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, United States of America; Emerging Pathogens Institute, University of Florida, Gainesville, FL 32610, United States of America
| | - Marco Salemi
- Emerging Pathogens Institute, University of Florida, Gainesville, FL 32610, United States of America; Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL 32610, United States of America
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, United States of America; Emerging Pathogens Institute, University of Florida, Gainesville, FL 32610, United States of America
| |
Collapse
|
8
|
Didier G, Laurin M. Testing extinction events and temporal shifts in diversification and fossilization rates through the skyline Fossilized Birth-Death (FBD) model: The example of some mid-Permian synapsid extinctions. Cladistics 2024; 40:282-306. [PMID: 38651531 DOI: 10.1111/cla.12577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/23/2024] [Accepted: 03/07/2024] [Indexed: 04/25/2024] Open
Abstract
In the last decade, the Fossilized Birth-Death (FBD) process has yielded interesting clues about the evolution of biodiversity through time. To facilitate such studies, we extend our method to compute the probability density of phylogenetic trees of extant and extinct taxa in which the only temporal information is provided by the fossil ages (i.e. without the divergence times) in order to deal with the piecewise constant FBD process, known as the "skyline FBD", which allows rates to change between pre-defined time intervals, as well as modelling extinction events at the bounds of these intervals. We develop approaches based on this method to assess hypotheses about the diversification process and to answer questions such as "Does a mass extinction occur at this time?" or "Is there a change in the fossilization rate between two given periods?". Our software can also yield Bayesian and maximum-likelihood estimates of the parameters of the skyline FBD model under various constraints. These approaches are applied to a simulated dataset in order to test their ability to answer the questions above. Finally, we study an updated dataset of Permo-Carboniferous synapsids to get additional insights into the dynamics of biodiversity change in three clades (Ophiacodontidae, Edaphosauridae and Sphenacodontidae) in the Pennsylvanian (Late Carboniferous) and Cisuralian (Early Permian), and to assess support for end-Sakmarian (or Artinskian) and end-Cisuralian mass extinction events discussed in previous studies.
Collapse
Affiliation(s)
| | - Michel Laurin
- CR2P ("Centre de Recherches sur la Paléobiodiversité et les Paléoenvironnements"; UMR 7207), CNRS/MNHN/UPMC, Sorbonne Université, Muséum National d'Histoire Naturelle, Paris, France
| |
Collapse
|
9
|
Quintero I, Lartillot N, Morlon H. Imbalanced speciation pulses sustain the radiation of mammals. Science 2024; 384:1007-1012. [PMID: 38815022 DOI: 10.1126/science.adj2793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 04/23/2024] [Indexed: 06/01/2024]
Abstract
The evolutionary histories of major clades, including mammals, often comprise changes in their diversification dynamics, but how these changes occur remains debated. We combined comprehensive phylogenetic and fossil information in a new "birth-death diffusion" model that provides a detailed characterization of variation in diversification rates in mammals. We found an early rising and sustained diversification scenario, wherein speciation rates increased before and during the Cretaceous-Paleogene (K-Pg) boundary. The K-Pg mass extinction event filtered out more slowly speciating lineages and was followed by a subsequent slowing in speciation rates rather than rebounds. These dynamics arose from an imbalanced speciation process, with separate lineages giving rise to many, less speciation-prone descendants. Diversity seems to have been brought about by these isolated, fast-speciating lineages, rather than by a few punctuated innovations.
Collapse
Affiliation(s)
- Ignacio Quintero
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Nicolas Lartillot
- Université Claude Bernard Lyon 1, CNRS, VetAgroSup, LBBE, UMR 5558, F-69100 Villeurbanne, France
| | - Hélène Morlon
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| |
Collapse
|
10
|
Khurana MP, Scheidwasser-Clow N, Penn MJ, Bhatt S, Duchêne DA. The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference. Syst Biol 2024; 73:235-246. [PMID: 38153910 PMCID: PMC11129600 DOI: 10.1093/sysbio/syad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/20/2023] [Accepted: 12/27/2023] [Indexed: 12/30/2023] Open
Abstract
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.
Collapse
Affiliation(s)
- Mark P Khurana
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Neil Scheidwasser-Clow
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Matthew J Penn
- Department of Statistics, University of Oxford, OX1 3LB, Oxford, UK
| | - Samir Bhatt
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, SW7 2AZ, London, UK
| | - David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark
| |
Collapse
|
11
|
Martínez-Gómez J, Song MJ, Tribble CM, Kopperud BT, Freyman WA, Höhna S, Specht CD, Rothfels CJ. Commonly used Bayesian diversification methods lead to biologically meaningful differences in branch-specific rates on empirical phylogenies. Evol Lett 2024; 8:189-199. [PMID: 39070288 PMCID: PMC11275465 DOI: 10.1093/evlett/qrad044] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 09/05/2023] [Accepted: 09/08/2023] [Indexed: 07/30/2024] Open
Abstract
Identifying along which lineages shifts in diversification rates occur is a central goal of comparative phylogenetics; these shifts may coincide with key evolutionary events such as the development of novel morphological characters, the acquisition of adaptive traits, polyploidization or other structural genomic changes, or dispersal to a new habitat and subsequent increase in environmental niche space. However, while multiple methods now exist to estimate diversification rates and identify shifts using phylogenetic topologies, the appropriate use and accuracy of these methods are hotly debated. Here we test whether five Bayesian methods-Bayesian Analysis of Macroevolutionary Mixtures (BAMM), two implementations of the Lineage-Specific Birth-Death-Shift model (LSBDS and PESTO), the approximate Multi-Type Birth-Death model (MTBD; implemented in BEAST2), and the Cladogenetic Diversification Rate Shift model (ClaDS2)-produce comparable results. We apply each of these methods to a set of 65 empirical time-calibrated phylogenies and compare inferences of speciation rate, extinction rate, and net diversification rate. We find that the five methods often infer different speciation, extinction, and net-diversification rates. Consequently, these different estimates may lead to different interpretations of the macroevolutionary dynamics. The different estimates can be attributed to fundamental differences among the compared models. Therefore, the inference of shifts in diversification rates is strongly method dependent. We advise biologists to apply multiple methods to test the robustness of the conclusions or to carefully select the method based on the validity of the underlying model assumptions to their particular empirical system.
Collapse
Affiliation(s)
- Jesús Martínez-Gómez
- Department of Integrative Biology and the University Herbarium, University of California, Berkeley, CA, United States
- School of Integrative Plant Science, Section of Plant Biology and the L.H. Bailey Hortorium, Cornell University, Ithaca, NY, United States
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, United States
| | - Michael J Song
- Department of Integrative Biology and the University Herbarium, University of California, Berkeley, CA, United States
- Department of Biology, Skyline College, San Bruno, CA, United States
| | - Carrie M Tribble
- Department of Integrative Biology and the University Herbarium, University of California, Berkeley, CA, United States
- School of Life Sciences, University of Hawai’i at Manoa, HI, United States
| | - Bjørn T Kopperud
- GeoBio-Center, Ludwig-Maximilians-Universitat München, Munich, Germany
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | | | - Sebastian Höhna
- GeoBio-Center, Ludwig-Maximilians-Universitat München, Munich, Germany
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Chelsea D Specht
- School of Integrative Plant Science, Section of Plant Biology and the L.H. Bailey Hortorium, Cornell University, Ithaca, NY, United States
| | - Carl J Rothfels
- Department of Integrative Biology and the University Herbarium, University of California, Berkeley, CA, United States
- Department of Biology, Utah State University, Logan, UT, United States
| |
Collapse
|
12
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. PLoS Comput Biol 2024; 20:e1011640. [PMID: 38551979 PMCID: PMC11006205 DOI: 10.1371/journal.pcbi.1011640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 04/10/2024] [Accepted: 03/10/2024] [Indexed: 04/09/2024] Open
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
| | - Andrew F. Magee
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, California, United States of America
| | - Marc A. Suchard
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, Universtiy of California, Los Angeles, California, United States of America
| |
Collapse
|
13
|
Chen Z, Lemey P, Yu H. Approaches and challenges to inferring the geographical source of infectious disease outbreaks using genomic data. THE LANCET. MICROBE 2024; 5:e81-e92. [PMID: 38042165 DOI: 10.1016/s2666-5247(23)00296-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/03/2023] [Accepted: 09/13/2023] [Indexed: 12/04/2023]
Abstract
Genomic data hold increasing potential in the elucidation of transmission dynamics and geographical sources of infectious disease outbreaks. Phylogeographic methods that use epidemiological and genomic data obtained from surveillance enable us to infer the history of spatial transmission that is naturally embedded in the topology of phylogenetic trees as a record of the dispersal of infectious agents between geographical locations. In this Review, we provide an overview of phylogeographic approaches widely used for reconstructing the geographical sources of outbreaks of interest. These approaches can be classified into ancestral trait or state reconstruction and structured population models, with structured population models including popular structured coalescent and birth-death models. We also describe the major challenges associated with sequencing technologies, surveillance strategies, data sharing, and analysis frameworks that became apparent during the generation of large-scale genomic data in recent years, extending beyond inference approaches. Finally, we highlight the role of genomic data in geographical source inference and clarify how this enhances understanding and molecular investigations of outbreak sources.
Collapse
Affiliation(s)
- Zhiyuan Chen
- School of Public Health, Fudan University, Key Laboratory of Public Health Safety, Ministry of Education, Shanghai, China
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, KU Leuven, Leuven, Belgium
| | - Hongjie Yu
- School of Public Health, Fudan University, Key Laboratory of Public Health Safety, Ministry of Education, Shanghai, China.
| |
Collapse
|
14
|
Lambert S, Voznica J, Morlon H. Deep Learning from Phylogenies for Diversification Analyses. Syst Biol 2023; 72:1262-1279. [PMID: 37556735 DOI: 10.1093/sysbio/syad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 06/20/2023] [Accepted: 08/08/2023] [Indexed: 08/11/2023] Open
Abstract
Birth-death (BD) models are widely used in combination with species phylogenies to study past diversification dynamics. Current inference approaches typically rely on likelihood-based methods. These methods are not generalizable, as a new likelihood formula must be established each time a new model is proposed; for some models, such a formula is not even tractable. Deep learning can bring solutions in such situations, as deep neural networks can be trained to learn the relation between simulations and parameter values as a regression problem. In this paper, we adapt a recently developed deep learning method from pathogen phylodynamics to the case of diversification inference, and we extend its applicability to the case of the inference of state-dependent diversification models from phylogenies associated with trait data. We demonstrate the accuracy and time efficiency of the approach for the time-constant homogeneous BD model and the Binary-State Speciation and Extinction model. Finally, we illustrate the use of the proposed inference machinery by reanalyzing a phylogeny of primates and their associated ecological role as seed dispersers. Deep learning inference provides at least the same accuracy as likelihood-based inference while being faster by several orders of magnitude, offering a promising new inference approach for the deployment of future models in the field.
Collapse
Affiliation(s)
- Sophia Lambert
- Institut de Biologie de l'École Normale Supérieure, École Normale Supérieure, CNRS, INSERM, Université Paris Sciences et Lettres, 46 Rue d'Ulm, 75005 Paris, France
- Institute of Ecology and Evolution, Department of Biology, 5289 University of Oregon, Eugene, OR 97403, USA
| | - Jakub Voznica
- Institut Pasteur, Université Paris Cité, Unité Bioinformatique Evolutive, 25-28 Rue du Dr Roux, 75015 Paris, France
- Unité de Biologie Computationnelle, USR 3756 CNRS, 25-28 Rue du Dr Roux, 75015 Paris, France
| | - Hélène Morlon
- Institut de Biologie de l'École Normale Supérieure, École Normale Supérieure, CNRS, INSERM, Université Paris Sciences et Lettres, 46 Rue d'Ulm, 75005 Paris, France
| |
Collapse
|
15
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.31.564882. [PMID: 37961423 PMCID: PMC10634968 DOI: 10.1101/2023.10.31.564882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
| | - Andrew F. Magee
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, United States
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, United States
| | - Marc A. Suchard
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California, Los Angeles, United States
| |
Collapse
|
16
|
Barido-Sottani J, Morlon H. The ClaDS rate-heterogeneous birth-death prior for full phylogenetic inference in BEAST2. Syst Biol 2023; 72:1180-1187. [PMID: 37161619 PMCID: PMC10627560 DOI: 10.1093/sysbio/syad027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 01/16/2023] [Accepted: 04/24/2023] [Indexed: 05/11/2023] Open
Abstract
Bayesian phylogenetic inference requires a tree prior, which models the underlying diversification process that gives rise to the phylogeny. Existing birth-death diversification models include a wide range of features, for instance, lineage-specific variations in speciation and extinction (SSE) rates. While across-lineage variation in SSE rates is widespread in empirical datasets, few heterogeneous rate models have been implemented as tree priors for Bayesian phylogenetic inference. As a consequence, rate heterogeneity is typically ignored when reconstructing phylogenies, and rate heterogeneity is usually investigated on fixed trees. In this paper, we present a new BEAST2 package implementing the cladogenetic diversification rate shift (ClaDS) model as a tree prior. ClaDS is a birth-death diversification model designed to capture small progressive variations in birth and death rates along a phylogeny. Unlike previous implementations of ClaDS, which were designed to be used with fixed, user-chosen phylogenies, our package is implemented in the BEAST2 framework and thus allows full phylogenetic inference, where the phylogeny and model parameters are co-estimated from a molecular alignment. Our package provides all necessary components of the inference, including a new tree object and operators to propose moves to the Monte-Carlo Markov chain. It also includes a graphical interface through BEAUti. We validate our implementation of the package by comparing the produced distributions to simulated data and show an empirical example of the full inference, using a dataset of cetaceans.
Collapse
Affiliation(s)
- Joëlle Barido-Sottani
- Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Hélène Morlon
- Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| |
Collapse
|
17
|
Carnegie L, Raghwani J, Fournié G, Hill SC. Phylodynamic approaches to studying avian influenza virus. Avian Pathol 2023; 52:289-308. [PMID: 37565466 DOI: 10.1080/03079457.2023.2236568] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 06/23/2023] [Accepted: 07/07/2023] [Indexed: 08/12/2023]
Abstract
Avian influenza viruses can cause severe disease in domestic and wild birds and are a pandemic threat. Phylodynamics is the study of how epidemiological, evolutionary, and immunological processes can interact to shape viral phylogenies. This review summarizes how phylodynamic methods have and could contribute to the study of avian influenza viruses. Specifically, we assess how phylodynamics can be used to examine viral spread within and between wild or domestic bird populations at various geographical scales, identify factors associated with virus dispersal, and determine the order and timing of virus lineage movement between geographic regions or poultry production systems. We discuss factors that can complicate the interpretation of phylodynamic results and identify how future methodological developments could contribute to improved control of the virus.
Collapse
Affiliation(s)
- L Carnegie
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| | - J Raghwani
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| | - G Fournié
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
- Université de Lyon, INRAE, VetAgro Sup, UMR EPIA, Marcy l'Etoile, France
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR EPIA, Saint Genes Champanelle, France
| | - S C Hill
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| |
Collapse
|
18
|
Wiens JJ. Trait-based species richness: ecology and macroevolution. Biol Rev Camb Philos Soc 2023; 98:1365-1387. [PMID: 37015839 DOI: 10.1111/brv.12957] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 03/21/2023] [Accepted: 03/27/2023] [Indexed: 04/06/2023]
Abstract
Understanding the origins of species richness patterns is a fundamental goal in ecology and evolutionary biology. Much research has focused on explaining two kinds of species richness patterns: (i) spatial species richness patterns (e.g. the latitudinal diversity gradient), and (ii) clade-based species richness patterns (e.g. the predominance of angiosperm species among plants). Here, I highlight a third kind of richness pattern: trait-based species richness (e.g. the number of species with each state of a character, such as diet or body size). Trait-based richness patterns are relevant to many topics in ecology and evolution, from ecosystem function to adaptive radiation to the paradox of sex. Although many studies have described particular trait-based richness patterns, the origins of these patterns remain far less understood, and trait-based richness has not been emphasised as a general category of richness patterns. Here, I describe a conceptual framework for how trait-based richness patterns arise compared to other richness patterns. A systematic review suggests that trait-based richness patterns are most often explained by when each state originates within a group (i.e. older states generally have higher richness), and not by differences in transition rates among states or faster diversification of species with certain states. This latter result contrasts with the widespread emphasis on diversification rates in species-richness research. I show that many recent studies of spatial richness patterns are actually studies of trait-based richness patterns, potentially confounding the causes of these patterns. Finally, I describe a plethora of unanswered questions related to trait-based richness patterns.
Collapse
Affiliation(s)
- John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721-0088, USA
| |
Collapse
|
19
|
Quintero I, Landis MJ, Jetz W, Morlon H. The build-up of the present-day tropical diversity of tetrapods. Proc Natl Acad Sci U S A 2023; 120:e2220672120. [PMID: 37159475 PMCID: PMC10194011 DOI: 10.1073/pnas.2220672120] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/04/2023] [Indexed: 05/11/2023] Open
Abstract
The extraordinary number of species in the tropics when compared to the extra-tropics is probably the most prominent and consistent pattern in biogeography, suggesting that overarching processes regulate this diversity gradient. A major challenge to characterizing which processes are at play relies on quantifying how the frequency and determinants of tropical and extra-tropical speciation, extinction, and dispersal events shaped evolutionary radiations. We address this question by developing and applying spatiotemporal phylogenetic and paleontological models of diversification for tetrapod species incorporating paleoenvironmental variation. Our phylogenetic model results show that area, energy, or species richness did not uniformly affect speciation rates across tetrapods and dispute expectations of a latitudinal gradient in speciation rates. Instead, both neontological and fossil evidence coincide in underscoring the role of extra-tropical extinctions and the outflow of tropical species in shaping biodiversity. These diversification dynamics accurately predict present-day levels of species richness across latitudes and uncover temporal idiosyncrasies but spatial generality across the major tetrapod radiations.
Collapse
Affiliation(s)
- Ignacio Quintero
- Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université Paris Science & Lettres, Paris75005, France
| | - Michael J. Landis
- Landis Lab, Department of Biology, Washington University in St. Louis, St. Louis, MO63130
| | - Walter Jetz
- Jetz Lab, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT06511
- Center for Biodiversity and Global Change, Yale University, New Haven, CT06511
| | - Hélène Morlon
- Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université Paris Science & Lettres, Paris75005, France
| |
Collapse
|
20
|
Cappello L, Kim J, Palacios JA. adaPop: Bayesian inference of dependent population dynamics in coalescent models. PLoS Comput Biol 2023; 19:e1010897. [PMID: 36940209 PMCID: PMC10063170 DOI: 10.1371/journal.pcbi.1010897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 03/30/2023] [Accepted: 01/25/2023] [Indexed: 03/21/2023] Open
Abstract
The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop, a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Julia A. Palacios
- Departments of Statistics and Biomedical Data Science, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Dragomir D, Allman ES, Rhodes JA. Parameter Identifiability of a Multitype Pure-Birth Model of Speciation. J Comput Biol 2023; 30:277-292. [PMID: 36745414 DOI: 10.1089/cmb.2022.0330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Diversification models describe the random growth of evolutionary trees, modeling the historical relationships of species through speciation and extinction events. One class of such models allows for independently changing traits, or types, of the species within the tree, upon which speciation and extinction rates depend. Although identifiability of parameters is necessary to justify parameter estimation with a model, it has not been formally established for these models, despite their adoption for inference. This work establishes generic identifiability up to label swapping for the parameters of one of the simpler forms of such a model, a multitype pure birth model of speciation, from an asymptotic distribution derived from a single tree observation as its depth goes to infinity. Crucially for applications to available data, no observation of types is needed at any internal points in the tree, nor even at the leaves.
Collapse
Affiliation(s)
- Dakota Dragomir
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| |
Collapse
|
22
|
Zhang Y, Britton T, Zhou X. Monitoring real-time transmission heterogeneity from incidence data. PLoS Comput Biol 2022; 18:e1010078. [PMID: 36455043 PMCID: PMC9746975 DOI: 10.1371/journal.pcbi.1010078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 12/13/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
The transmission heterogeneity of an epidemic is associated with a complex mixture of host, pathogen and environmental factors. And it may indicate superspreading events to reduce the efficiency of population-level control measures and to sustain the epidemic over a larger scale and a longer duration. Methods have been proposed to identify significant transmission heterogeneity in historic epidemics based on several data sources, such as contact history, viral genomes and spatial information, which may not be available, and more importantly ignore the temporal trend of transmission heterogeneity. Here we attempted to establish a convenient method to estimate real-time heterogeneity over an epidemic. Within the branching process framework, we introduced an instant-individualheterogenous infectiousness model to jointly characterize the variation in infectiousness both between individuals and among different times. With this model, we could simultaneously estimate the transmission heterogeneity and the reproduction number from incidence time series. We validated the model with data of both simulated and real outbreaks. Our estimates of the overall and real-time heterogeneities of the six epidemics were consistent with those presented in the literature. Additionally, our model is robust to the ubiquitous bias of under-reporting and misspecification of serial interval. By analyzing recent data from South Africa, we found evidence that the Omicron might be of more significant transmission heterogeneity than Delta. Our model based on incidence data was proved to be reliable in estimating the real-time transmission heterogeneity.
Collapse
Affiliation(s)
- Yunjun Zhang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
| | - Tom Britton
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| | - Xiaohua Zhou
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
- Beijing International Center for Mathematical Research, Peking University, Beijing, China
- School of Mathematical Sciences, Peking University, Beijing, China
| |
Collapse
|
23
|
Hassler GW, Magee A, Zhang Z, Baele G, Lemey P, Ji X, Fourment M, Suchard MA. Data integration in Bayesian phylogenetics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 10:353-377. [PMID: 38774036 PMCID: PMC11108065 DOI: 10.1146/annurev-statistics-033021-112532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.
Collapse
Affiliation(s)
- Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
| | - Andrew Magee
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Zhenyu Zhang
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, USA, 70118
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo NSW, Australia, 2007
| | - Marc A Suchard
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
- Department of Human Genetics, University of California, Los Angeles, USA, 90095
| |
Collapse
|
24
|
Douglas J, Bouckaert R. Quantitatively defining species boundaries with more efficiency and more biological realism. Commun Biol 2022; 5:755. [PMID: 35902726 PMCID: PMC9334598 DOI: 10.1038/s42003-022-03723-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 07/12/2022] [Indexed: 11/09/2022] Open
Abstract
We introduce a widely applicable species delimitation method based on the multispecies coalescent model that is more efficient and more biologically realistic than existing methods. We extend a threshold-based method to allow the ancestral speciation rate to vary through time as a smooth piecewise function. Furthermore, we introduce the cutting-edge proposal kernels of StarBeast3 to this model, thus enabling rapid species delimitation on large molecular datasets and allowing the use of relaxed molecular clock models. We validate these methods with genomic sequence data and SNP data, and show they are more efficient than existing methods at achieving parameter convergence during Bayesian MCMC. Lastly, we apply these methods to two datasets (Hemidactylus and Galagidae) and find inconsistencies with the published literature. Our methods are powerful for rapid quantitative testing of species boundaries in large multilocus datasets and are implemented as an open source BEAST 2 package called SPEEDEMON. Introducing SPEEDEMON, a package for BEAST 2 that better defines species boundaries based on molecular data demonstrated on gecko and loris datasets.
Collapse
Affiliation(s)
- Jordan Douglas
- School of Computer Science, The University of Auckland, Auckland, New Zealand.
| | - Remco Bouckaert
- School of Computer Science, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
25
|
Vasconcelos T, O'Meara BC, Beaulieu JM. A flexible method for estimating tip diversification rates across a range of speciation and extinction scenarios. Evolution 2022; 76:1420-1433. [PMID: 35661352 DOI: 10.1111/evo.14517] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/08/2022] [Indexed: 01/21/2023]
Abstract
Estimates of diversification rates at the tips of a phylogeny provide a flexible approach for correlation analyses with multiple traits and to map diversification rates in space while also avoiding the uncertainty of deep time rate reconstructions. Available methods for tip rate estimation make different assumptions, and thus their accuracy usually depends on the characteristics of the underlying model generating the tree. Here, we introduce MiSSE, a trait-free, state-dependent speciation and extinction approach that can be used to estimate varying speciation, extinction, net diversification, turnover, and extinction fractions at the tips of the tree. We compare the accuracy of tip rates inferred by MiSSE against similar methods and demonstrate that, due to certain characteristics of the model, the error is generally low across a broad range of speciation and extinction scenarios. MiSSE can be used alongside regular phylogenetic comparative methods in trait-related diversification hypotheses, and we also describe a simple correction to avoid pseudoreplication from sister tips in analyses of independent contrasts. Finally, we demonstrate the capabilities of MiSSE, with a renewed focus on classic comparative methods, to examine the correlation between plant height and turnover rates in eucalypts, a species-rich lineage of flowering plants.
Collapse
Affiliation(s)
- Thais Vasconcelos
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, 72701
| | - Brian C O'Meara
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, Tennessee, 37996
| | - Jeremy M Beaulieu
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, 72701
| |
Collapse
|
26
|
Featherstone LA, Zhang JM, Vaughan TG, Duchene S. Epidemiological inference from pathogen genomes: A review of phylodynamic models and applications. Virus Evol 2022; 8:veac045. [PMID: 35775026 PMCID: PMC9241095 DOI: 10.1093/ve/veac045] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/23/2022] [Accepted: 06/02/2022] [Indexed: 11/24/2022] Open
Abstract
Phylodynamics requires an interdisciplinary understanding of phylogenetics, epidemiology, and statistical inference. It has also experienced more intense application than ever before amid the SARS-CoV-2 pandemic. In light of this, we present a review of phylodynamic models beginning with foundational models and assumptions. Our target audience is public health researchers, epidemiologists, and biologists seeking a working knowledge of the links between epidemiology, evolutionary models, and resulting epidemiological inference. We discuss the assumptions linking evolutionary models of pathogen population size to epidemiological models of the infected population size. We then describe statistical inference for phylodynamic models and list how output parameters can be rearranged for epidemiological interpretation. We go on to cover more sophisticated models and finish by highlighting future directions.
Collapse
Affiliation(s)
- Leo A Featherstone
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Joshua M Zhang
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- Swiss Institute of Bioinformatics, Geneva 1015, Switzerland
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
27
|
Morlon H, Robin S, Hartig F. Studying speciation and extinction dynamics from phylogenies: addressing identifiability issues. Trends Ecol Evol 2022; 37:497-506. [PMID: 35246322 DOI: 10.1016/j.tree.2022.02.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/18/2022]
Abstract
A lot of what we know about past speciation and extinction dynamics is based on statistically fitting birth-death processes to phylogenies of extant species. Despite their wide use, the reliability of these tools is regularly questioned. It was recently demonstrated that vast 'congruent' sets of alternative diversification histories cannot be distinguished (i.e., are not identifiable) using extant phylogenies alone, reanimating the debate about the limits of phylogenetic diversification analysis. Here, we summarize what we know about the identifiability of the birth-death process and how identifiability issues can be addressed. We conclude that extant phylogenies, when combined with appropriate prior hypotheses and regularization techniques, can still tell us a lot about past diversification dynamics.
Collapse
Affiliation(s)
- Hélène Morlon
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, PSL Research University, Paris, France.
| | - Stéphane Robin
- UMR MIA-Paris, AgroParisTech, INRA, Paris-Saclay University, 75005 Paris, France; Centre d'Ecologie et des Sciences de la Conservation (CESCO), Muséum National d'Histoire Naturelle, CNRS, Sorbonne University, Paris, France
| | - Florian Hartig
- Theoretical Ecology, University of Regensburg, Regensburg, Germany
| |
Collapse
|
28
|
Cappello L, Kim J, Liu S, Palacios JA. Statistical Challenges in Tracking the Evolution of SARS-CoV-2. Stat Sci 2022; 37:162-182. [PMID: 36034090 PMCID: PMC9409356 DOI: 10.1214/22-sts853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, 08005, Spain
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York 14853, USA\
| | - Sifan Liu
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Julia A Palacios
- Departments of Statistics and Biomedical Data Sciences, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
29
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
30
|
|
31
|
Helekal D, Ledda A, Volz E, Wyllie D, Didelot X. Bayesian inference of clonal expansions in a dated phylogeny. Syst Biol 2021; 71:1073-1087. [PMID: 34893904 PMCID: PMC9366454 DOI: 10.1093/sysbio/syab095] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/23/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Microbial population genetics models often assume that all lineages are constrained by the same population size dynamics over time. However, many neutral and selective events can invalidate this assumption and can contribute to the clonal expansion of a specific lineage relative to the rest of the population. Such differential phylodynamic properties between lineages result in asymmetries and imbalances in phylogenetic trees that are sometimes described informally but which are difficult to analyze formally. To this end, we developed a model of how clonal expansions occur and affect the branching patterns of a phylogeny. We show how the parameters of this model can be inferred from a given dated phylogeny using Bayesian statistics, which allows us to assess the probability that one or more clonal expansion events occurred. For each putative clonal expansion event, we estimate its date of emergence and subsequent phylodynamic trajectory, including its long-term evolutionary potential which is important to determine how much effort should be placed on specific control measures. We demonstrate the applicability of our methodology on simulated and real data sets. Inference under our clonal expansion model can reveal important features in the evolution and epidemiology of infectious disease pathogens. [Clonal expansion; genomic epidemiology; microbial population genomics; phylodynamics.]
Collapse
Affiliation(s)
- David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, United Kingdom
| | - Alice Ledda
- Healthcare Associated Infections and Antimicrobial Resistance Division, National Infection Service, Public Health England, United Kingdom
| | - Erik Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - David Wyllie
- Field Service, East of England, National Infection Service, Public Health England, Cambridge, United Kingdom
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| |
Collapse
|
32
|
Helmstetter AJ, Glemin S, Käfer J, Zenil-Ferguson R, Sauquet H, de Boer H, Dagallier LPMJ, Mazet N, Reboud EL, Couvreur TLP, Condamine FL. Pulled Diversification Rates, Lineages-Through-Time Plots and Modern Macroevolutionary Modelling. Syst Biol 2021; 71:758-773. [PMID: 34613395 PMCID: PMC9016617 DOI: 10.1093/sysbio/syab083] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 11/29/2022] Open
Abstract
Estimating time-dependent rates of speciation and extinction from dated phylogenetic trees of extant species (timetrees), and determining how and why they vary, is key to understanding how ecological and evolutionary processes shape biodiversity. Due to an increasing availability of phylogenetic trees, a growing number of process-based methods relying on the birth–death model have been developed in the last decade to address a variety of questions in macroevolution. However, this methodological progress has regularly been criticized such that one may wonder how reliable the estimations of speciation and extinction rates are. In particular, using lineages-through-time (LTT) plots, a recent study has shown that there are an infinite number of equally likely diversification scenarios that can generate any timetree. This has led to questioning whether or not diversification rates should be estimated at all. Here, we summarize, clarify, and highlight technical considerations on recent findings regarding the capacity of models to disentangle diversification histories. Using simulations, we illustrate the characteristics of newly proposed “pulled rates” and their utility. We recognize that the recent findings are a step forward in understanding the behavior of macroevolutionary modeling, but they in no way suggest we should abandon diversification modeling altogether. On the contrary, the study of macroevolution using phylogenetic trees has never been more exciting and promising than today. We still face important limitations in regard to data availability and methods, but by acknowledging them we can better target our joint efforts as a scientific community. [Birth–death models; extinction; phylogenetics; speciation.]
Collapse
Affiliation(s)
- Andrew J Helmstetter
- Fondation pour la Recherche sur la Biodiversité - Centre for the Synthesis and Analysis of Biodiversity, 34000 Montpellier, France
| | - Sylvain Glemin
- CNRS, Ecosystmes Biodiversit Evolution (Universit de Rennes), 35000 Rennes, France
| | - Jos Käfer
- Universit de Lyon, Universit Lyon 1, CNRS, Laboratoire de Biomtrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France
| | | | - Herv Sauquet
- National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, 2000, Australia.,Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, Australia
| | - Hugo de Boer
- Natural History Museum, University of Oslo, 0318 Oslo, Norway
| | | | - Nathan Mazet
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| | - Eliette L Reboud
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| | | | - Fabien L Condamine
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| |
Collapse
|
33
|
Abstract
Reconstructing the history of biodiversity has been hindered by often-separate analyses of stem and crown groups of the clades in question that are not easily understood within the same unified evolutionary framework. Here, we investigate the evolutionary history of birds by analyzing three supertrees that combine published phylogenies of both stem and crown birds. Our analyses reveal three distinct large-scale increases in the diversification rate across bird evolutionary history. The first increase, which began between 160 and 170 Ma and reached its peak between 130 and 135 Ma, corresponds to an accelerated morphological evolutionary rate associated with the locomotory systems among early stem birds. This radiation resulted in morphospace occupation that is larger and different from their close dinosaurian relatives, demonstrating the occurrence of a radiation among early stem birds. The second increase, which started ∼90 Ma and reached its peak between 65 and 55 Ma, is associated with rapid evolution of the cranial skeleton among early crown birds, driven differently from the first radiation. The third increase, which occurred after ∼40 to 45 Ma, has yet to be supported by quantitative morphological data but gains some support from the fossil record. Our analyses indicate that the bird biodiversity evolution was influenced mainly by long-term climatic changes and also by major paleobiological events such as the Cretaceous-Paleogene (K-Pg) extinction.
Collapse
|
34
|
Maliet O, Morlon H. Fast and accurate estimation of species-specific diversification rates using data augmentation. Syst Biol 2021; 71:353-366. [PMID: 34228799 DOI: 10.1093/sysbio/syab055] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 06/10/2021] [Accepted: 06/30/2021] [Indexed: 11/13/2022] Open
Abstract
Diversification rates vary across species as a response to various factors, including environmental conditions and species-specific features. Phylogenetic models that allow accounting for and quantifying this heterogeneity in diversification rates have proven particularly useful for understanding clades diversification. Recently, we introduced the cladogenetic diversification rate shift model (ClaDS), which allows inferring multiple rate changes of small magnitude across lineages. Here we present a new inference technique for this model that considerably reduces computation time through the use of data augmentation and provide an implementation of this method in Julia. In addition to drastically reducing computation time, this new inference approach provides a posterior distribution of the augmented data, that is the tree with extinct and unsampled lineages as well as associated diversification rates. In particular, this allows extracting the distribution through time of both the mean rate and the number of lineages. We assess the statistical performances of our approach using simulations and illustrate its application on the entire bird radiation.
Collapse
Affiliation(s)
- Odile Maliet
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Research University, 75005 Paris, France
| | - Hélène Morlon
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Research University, 75005 Paris, France
| |
Collapse
|
35
|
MacPherson A, Louca S, McLaughlin A, Joy JB, Pennell MW. Unifying Phylogenetic Birth-Death Models in Epidemiology and Macroevolution. Syst Biol 2021; 71:172-189. [PMID: 34165577 PMCID: PMC8972974 DOI: 10.1093/sysbio/syab049] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/09/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Birth–death stochastic processes are the foundations of many phylogenetic models and are
widely used to make inferences about epidemiological and macroevolutionary dynamics. There
are a large number of birth–death model variants that have been developed; these impose
different assumptions about the temporal dynamics of the parameters and about the sampling
process. As each of these variants was individually derived, it has been difficult to
understand the relationships between them as well as their precise biological and
mathematical assumptions. Without a common mathematical foundation, deriving new models is
nontrivial. Here, we unify these models into a single framework, prove that many
previously developed epidemiological and macroevolutionary models are all special cases of
a more general model, and illustrate the connections between these variants. This
unification includes both models where the process is the same for all lineages and those
in which it varies across types. We also outline a straightforward procedure for deriving
likelihood functions for arbitrarily complex birth–death(-sampling) models that will
hopefully allow researchers to explore a wider array of scenarios than was previously
possible. By rederiving existing single-type birth–death sampling models, we clarify and
synthesize the range of explicit and implicit assumptions made by these models.
[Birth–death processes; epidemiology; macroevolution; phylogenetics; statistical
inference.]
Collapse
Affiliation(s)
- Ailene MacPherson
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Stilianos Louca
- Department of Biology, University of Oregon, USA.,Institute of Ecology and Evolution, University of Oregon, USA
| | - Angela McLaughlin
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada
| | - Jeffrey B Joy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada.,Department of Medicine, University of British Columbia, Vancouver, Canada
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| |
Collapse
|
36
|
Laudanno G, Haegeman B, Rabosky DL, Etienne RS. Detecting Lineage-Specific Shifts in Diversification: A Proper Likelihood Approach. Syst Biol 2021; 70:389-407. [PMID: 32617585 PMCID: PMC7875465 DOI: 10.1093/sysbio/syaa048] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 06/11/2020] [Accepted: 06/23/2020] [Indexed: 11/25/2022] Open
Abstract
The branching patterns of molecular phylogenies are generally assumed to contain information on rates of the underlying speciation and extinction processes. Simple birth-death models with constant, time-varying, or diversity-dependent rates have been invoked to explain these patterns. They have one assumption in common: all lineages have the same set of diversification rates at a given point in time. It seems likely, however, that there is variability in diversification rates across subclades in a phylogenetic tree. This has inspired the construction of models that allow multiple rate regimes across the phylogeny, with instantaneous shifts between these regimes. Several methods exist for calculating the likelihood of a phylogeny under a specified mapping of diversification regimes and for performing inference on the most likely diversification history that gave rise to a particular phylogenetic tree. Here, we show that the likelihood computation of these methods is not correct. We provide a new framework to compute the likelihood correctly and show, with simulations of a single shift, that the correct likelihood indeed leads to parameter estimates that are on average in much better agreement with the generating parameters than the incorrect likelihood. Moreover, we show that our corrected likelihood can be extended to multiple rate shifts in time-dependent and diversity-dependent models. We argue that identifying shifts in diversification rates is a nontrivial model selection exercise where one has to choose whether shifts in now-extinct lineages are taken into account or not. Hence, our framework also resolves the recent debate on such unobserved shifts. [Diversification; macroevolution; phylogeny; speciation].
Collapse
Affiliation(s)
- Giovanni Laudanno
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Box 11103, 9700 CC, Groningen, The Netherlands
| | - Bart Haegeman
- Centre for Biodiversity Theory and Modelling, Theoretical and Experimental Ecology Station, CNRS and Paul Sabatier University, 09200, Moulis, France
| | - Daniel L Rabosky
- Museum of Zoology & Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Rampal S Etienne
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Box 11103, 9700 CC, Groningen, The Netherlands
| |
Collapse
|
37
|
Dearlove B, Tovanabutra S, Owen CL, Lewitus E, Li Y, Sanders-Buell E, Bose M, O’Sullivan AM, Kijak G, Miller S, Poltavee K, Lee J, Bonar L, Harbolick E, Ahani B, Pham P, Kibuuka H, Maganga L, Nitayaphan S, Sawe FK, Kim JH, Eller LA, Vasan S, Gramzinski R, Michael NL, Robb ML, Rolland M, the RV217 Study Team. Factors influencing estimates of HIV-1 infection timing using BEAST. PLoS Comput Biol 2021; 17:e1008537. [PMID: 33524022 PMCID: PMC7877758 DOI: 10.1371/journal.pcbi.1008537] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/11/2021] [Accepted: 11/13/2020] [Indexed: 12/15/2022] Open
Abstract
While large datasets of HIV-1 sequences are increasingly being generated, many studies rely on a single gene or fragment of the genome and few comparative studies across genes have been done. We performed genome-based and gene-specific Bayesian phylogenetic analyses to investigate how certain factors impact estimates of the infection dates in an acute HIV-1 infection cohort, RV217. In this cohort, HIV-1 diagnosis corresponded to the first RNA positive test and occurred a median of four days after the last negative test, allowing us to compare timing estimates using BEAST to a narrow window of infection. We analyzed HIV-1 sequences sampled one week, one month and six months after HIV-1 diagnosis in 39 individuals. We found that shared diversity and temporal signal was limited in acute infection, and insufficient to allow timing inferences in the shortest HIV-1 genes, thus dated phylogenies were primarily analyzed for env, gag, pol and near full-length genomes. There was no one best-fitting model across participants and genes, though relaxed molecular clocks (73% of best-fitting models) and the Bayesian skyline (49%) tended to be favored. For infections with single founders, the infection date was estimated to be around one week pre-diagnosis for env (IQR: 3–9 days) and gag (IQR: 5–9 days), whilst the genome placed it at a median of 10 days (IQR: 4–19). Multiply-founded infections proved problematic to date. Our ability to compare timing inferences to precise estimates of HIV-1 infection (within a week) highlights that molecular dating methods can be applied to within-host datasets from early infection. Nonetheless, our results also suggest caution when using uniform clock and population models or short genes with limited information content. Molecular dating using phylogenetics allows us to estimate the date of an infection from time-stamped within-host sequences alone. There are large datasets of HIV-1 sequences, but genome and gene analyses are not often performed in parallel and rarely with the possibility to compare results against a known narrow window of infection. We showed that all but the longest genes are near-clonal in acute infection, with little information for dating purposes. For infections with single founders, we estimated the eclipse phase—the time between HIV-1 exposure and the first positive diagnostic test—to last between one and two weeks using env, gag, pol and near full-length genomes. This approach could be used to narrow the date of suspected infection in ongoing clinical trials for the prevention of HIV-1 infection.
Collapse
Affiliation(s)
- Bethany Dearlove
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Sodsai Tovanabutra
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Christopher L. Owen
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Eric Lewitus
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Yifan Li
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Eric Sanders-Buell
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Meera Bose
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Anne-Marie O’Sullivan
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Gustavo Kijak
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Shana Miller
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Kultida Poltavee
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Jenica Lee
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Lydia Bonar
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Elizabeth Harbolick
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Bahar Ahani
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Phuc Pham
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Hannah Kibuuka
- Makerere University Walter Reed Project, Kampala, Uganda
| | - Lucas Maganga
- National Institute for Medical Research-Mbeya Medical Research Centre, Mbeya, Tanzania
| | | | - Fred K. Sawe
- Kenya Medical Research Institute/U.S. Army Medical Research Directorate-Africa/Kenya-Henry Jackson Foundation MRI, Kericho, Kenya
| | | | - Leigh Anne Eller
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Sandhya Vasan
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Robert Gramzinski
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
| | - Nelson L. Michael
- Center for Infectious Diseases Research, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
| | - Merlin L. Robb
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
| | - Morgane Rolland
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, United States of America
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America
- * E-mail:
| | | |
Collapse
|
38
|
King B. Bayesian Tip-Dated Phylogenetics in Paleontology: Topological Effects and Stratigraphic Fit. Syst Biol 2020; 70:283-294. [PMID: 32692834 DOI: 10.1093/sysbio/syaa057] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 07/14/2020] [Accepted: 07/15/2020] [Indexed: 11/14/2022] Open
Abstract
The incorporation of stratigraphic data into phylogenetic analysis has a long history of debate but is not currently standard practice for paleontologists. Bayesian tip-dated (or morphological clock) phylogenetic methods have returned these arguments to the spotlight, but how tip dating affects the recovery of evolutionary relationships has yet to be fully explored. Here I show, through analysis of several data sets with multiple phylogenetic methods, that topologies produced by tip dating are outliers as compared to topologies produced by parsimony and undated Bayesian methods, which retrieve broadly similar trees. Unsurprisingly, trees recovered by tip dating have better fit to stratigraphy than trees recovered by other methods under both the Gap Excess Ratio (GER) and the Stratigraphic Completeness Index (SCI). This is because trees with better stratigraphic fit are assigned a higher likelihood by the fossilized birth-death tree model. However, the degree to which the tree model favors tree topologies with high stratigraphic fit metrics is modulated by the diversification dynamics of the group under investigation. In particular, when net diversification rate is low, the tree model favors trees with a higher GER compared to when net diversification rate is high. Differences in stratigraphic fit and tree topology between tip dating and other methods are concentrated in parts of the tree with weaker character signal, as shown by successive deletion of the most incomplete taxa from two data sets. These results show that tip dating incorporates stratigraphic data in an intuitive way, with good stratigraphic fit an expectation that can be overturned by strong evidence from character data. [fossilized birth-death; fossils; missing data; morphological clock; morphology; parsimony; phylogenetics.].
Collapse
Affiliation(s)
- Benedict King
- Naturalis Biodiversity Center, Postbus 9517, 2300 RA, Leiden, The Netherlands.,College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| |
Collapse
|