1
|
Rannala B, Yang Z. Reading tree leaves: inferring speciation anfd extinction processes using phylogenies. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230309. [PMID: 39976406 PMCID: PMC11867106 DOI: 10.1098/rstb.2023.0309] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 09/21/2024] [Accepted: 10/14/2024] [Indexed: 02/21/2025] Open
Abstract
The birth-death process (BDP) is widely used in evolutionary biology as a model for generating phylogenetic trees of species. The generalized birth-death process (GBDP) allows rate variation over time, with speciation and extinction rates to be arbitrary functions of time. Here we review the probability theory underpinning the GBDP as a model of cladogenesis and recent findings concerning its identifiability. The GBDP with arbitrary continuous rate functions has been shown to be non-identifiable from lineage-through-time data: even with species phylogenies of infinite size the parameters cannot be estimated. However, a restricted class of BDPs with piecewise-constant rates has been shown to be identifiable. We review and illustrate these results using simple examples and discuss their implications for biologists interested in inferring the past tempo and mode of evolution using reconstructed phylogenetic trees.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA95616, USA
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, UK
| |
Collapse
|
2
|
Legried B, Terhorst J. Identifiability and inference of phylogenetic birth-death models. J Theor Biol 2023; 568:111520. [PMID: 37148965 DOI: 10.1016/j.jtbi.2023.111520] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 03/28/2023] [Accepted: 04/26/2023] [Indexed: 05/08/2023]
Abstract
Recent theoretical work on phylogenetic birth-death models offers differing viewpoints on whether they can be estimated using lineage-through-time data. Louca and Pennell (2020) showed that the class of models with continuously differentiable rate functions is nonidentifiable: any such model is consistent with an infinite collection of alternative models, which are statistically indistinguishable regardless of how much data are collected. Legried and Terhorst (2022) qualified this grave result by showing that identifiability is restored if only piecewise constant rate functions are considered. Here, we contribute new theoretical results to this discussion, in both the positive and negative directions. Our main result is to prove that models based on piecewise polynomial rate functions of any order and with any (finite) number of pieces are statistically identifiable. In particular, this implies that spline-based models with an arbitrary number of knots are identifiable. The proof is simple and self-contained, relying mainly on basic algebra. We complement this positive result with a negative one, which shows that even when identifiability holds, rate function estimation is still a difficult problem. To illustrate this, we prove some rates-of-convergence results for hypothesis testing using birth-death models. These results are information-theoretic lower bounds which apply to all potential estimators.
Collapse
Affiliation(s)
- Brandon Legried
- School of Mathematics, Georgia Institute of Technology, 686 Cherry Street, Atlanta, 30332, GA, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, 1085 S. University Ave, Ann Arbor, 48109, MI, USA.
| |
Collapse
|
3
|
Louca S, McLaughlin A, MacPherson A, Joy JB, Pennell MW. Fundamental Identifiability Limits in Molecular Epidemiology. Mol Biol Evol 2021; 38:4010-4024. [PMID: 34009339 PMCID: PMC8382926 DOI: 10.1093/molbev/msab149] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Viral phylogenies provide crucial information on the spread of infectious diseases, and many studies fit mathematical models to phylogenetic data to estimate epidemiological parameters such as the effective reproduction ratio (Re) over time. Such phylodynamic inferences often complement or even substitute for conventional surveillance data, particularly when sampling is poor or delayed. It remains generally unknown, however, how robust phylodynamic epidemiological inferences are, especially when there is uncertainty regarding pathogen prevalence and sampling intensity. Here, we use recently developed mathematical techniques to fully characterize the information that can possibly be extracted from serially collected viral phylogenetic data, in the context of the commonly used birth-death-sampling model. We show that for any candidate epidemiological scenario, there exists a myriad of alternative, markedly different, and yet plausible "congruent" scenarios that cannot be distinguished using phylogenetic data alone, no matter how large the data set. In the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the "congruence class" of the true dynamics. We propose concrete and feasible strategies for making more robust epidemiological inferences from viral phylogenetic data.
Collapse
Affiliation(s)
- Stilianos Louca
- Department of Biology, University of Oregon, Eugene, OR, USA
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Angela McLaughlin
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Ailene MacPherson
- Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Jeffrey B Joy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics, University of British Columbia, Vancouver, BC, Canada
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Matthew W Pennell
- Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
4
|
Louca S, Pennell MW. Why extinction estimates from extant phylogenies are so often zero. Curr Biol 2021; 31:3168-3173.e4. [PMID: 34019824 DOI: 10.1016/j.cub.2021.04.066] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/11/2021] [Accepted: 04/26/2021] [Indexed: 12/18/2022]
Abstract
Time-calibrated phylogenies of extant species ("extant timetrees") are widely used to estimate historical speciation and extinction rates by fitting stochastic birth-death models.1 These approaches have long been controversial, as many phylogenetic studies report zero extinction in many taxa, contradicting the high extinction rates seen in the fossil record and the fact that the majority of species ever to have existed are now extinct.2-9 To date, the causes of this discrepancy remain unresolved. Here, we provide a novel and simple explanation for these "zero-inflated" extinction estimates, based on the recent discovery that there exist many alternative "congruent" diversification scenarios that cannot be distinguished based solely on extant timetrees.10 Due to such congruencies, estimation methods tend to converge to some scenario congruent to (i.e., statistically indistinguishable from) the true diversification scenario, but not necessarily to the true diversification scenario itself. This congruent scenario may exhibit negative extinction rates, a biologically meaningless but mathematically feasible situation, in which case estimators will tend to stick to the boundary of zero extinction. Based on this explanation, we make multiple testable predictions, which we confirm using analyses of simulated trees and 121 empirical trees. In contrast to other proposed mechanisms for erroneous extinction rate estimates,5,11-14 our proposed mechanism specifically explains the zero inflation of previous extinction rate estimates in the absence of detectable model violations, even for large trees. Not only do our results likely resolve a long-standing mystery in phylogenetics, they demonstrate that model congruencies can have severe consequences in practice.
Collapse
Affiliation(s)
- Stilianos Louca
- Department of Biology, University of Oregon, 1210 University of Oregon, Eugene, OR 97403, USA; Institute of Ecology and Evolution, University of Oregon, 5289 University of Oregon, Eugene, OR 97403, USA.
| | - Matthew W Pennell
- Biodiversity Research Centre, University of British Columbia, 2212 Main Mall, Vancouver, BC V6T1Z4, Canada; Department of Zoology, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T1Z4, Canada.
| |
Collapse
|
5
|
Manceau M, Gupta A, Vaughan T, Stadler T. The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data. J Theor Biol 2021; 509:110400. [PMID: 32739241 PMCID: PMC7733867 DOI: 10.1016/j.jtbi.2020.110400] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 05/07/2020] [Accepted: 07/03/2020] [Indexed: 01/10/2023]
Abstract
We consider a homogeneous birth-death process with three different sampling schemes. First, individuals can be sampled through time and included in a reconstructed phylogenetic tree. Second, they can be sampled through time and only recorded as a point 'occurrence' along a timeline. Third, extant individuals can be sampled and included in the reconstructed phylogenetic tree with a fixed probability. We further consider that sampled individuals can be removed or not from the process, upon sampling, with fixed probability. We derive the probability distribution of the population size at any time in the past conditional on the joint observation of a reconstructed phylogenetic tree and a record of occurrences not included in the tree. We also provide an algorithm to simulate ancestral population size trajectories given the observation of a reconstructed phylogenetic tree and occurrences. This distribution can be readily used to draw inferences about the ancestral population size in the field of epidemiology and macroevolution. In epidemiology, these results will allow data from epidemiological case count studies to be used in conjunction with molecular sequencing data (yielding reconstructed phylogenetic trees) to coherently estimate prevalence through time. In macroevolution, it will foster the joint examination of the fossil record and extant taxa to reconstruct past biodiversity.
Collapse
Affiliation(s)
- Marc Manceau
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
| | - Ankit Gupta
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Timothy Vaughan
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
| |
Collapse
|
6
|
A characterisation of the reconstructed birth-death process through time rescaling. Theor Popul Biol 2020; 134:61-76. [PMID: 32439294 DOI: 10.1016/j.tpb.2020.05.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/15/2020] [Accepted: 05/05/2020] [Indexed: 11/23/2022]
Abstract
The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability ψ, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as ψ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of n logistic random variables with mode log(1∕ψ); moreover, we show that the inter-event times are approximately exponentially distributed.
Collapse
|
7
|
Extant timetrees are consistent with a myriad of diversification histories. Nature 2020; 580:502-505. [PMID: 32322065 DOI: 10.1038/s41586-020-2176-1] [Citation(s) in RCA: 248] [Impact Index Per Article: 49.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 03/10/2020] [Indexed: 11/09/2022]
Abstract
Time-calibrated phylogenies of extant species (referred to here as 'extant timetrees') are widely used for estimating diversification dynamics1. However, there has been considerable debate surrounding the reliability of these inferences2-5 and, to date, this critical question remains unresolved. Here we clarify the precise information that can be extracted from extant timetrees under the generalized birth-death model, which underlies most existing methods of estimation. We prove that, for any diversification scenario, there exists an infinite number of alternative diversification scenarios that are equally likely to have generated any given extant timetree. These 'congruent' scenarios cannot possibly be distinguished using extant timetrees alone, even in the presence of infinite data. Importantly, congruent diversification scenarios can exhibit markedly different and yet similarly plausible dynamics, which suggests that many previous studies may have over-interpreted phylogenetic evidence. We introduce identifiable and easily interpretable variables that contain all available information about past diversification dynamics, and demonstrate that these can be estimated from extant timetrees. We suggest that measuring and modelling these identifiable variables offers a more robust way to study historical diversification dynamics. Our findings also make it clear that palaeontological data will continue to be crucial for answering some macroevolutionary questions.
Collapse
|
8
|
Louca S, Pennell MW. A General and Efficient Algorithm for the Likelihood of Diversification and Discrete-Trait Evolutionary Models. Syst Biol 2019; 69:545-556. [DOI: 10.1093/sysbio/syz055] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 08/14/2019] [Accepted: 08/15/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
As the size of phylogenetic trees and comparative data continue to grow and more complex models are developed to investigate the processes that gave rise to them, macroevolutionary analyses are becoming increasingly limited by computational requirements. Here, we introduce a novel algorithm, based on the “flow” of the differential equations that describe likelihoods along tree edges in backward time, to reduce redundancy in calculations and efficiently compute the likelihood of various macroevolutionary models. Our algorithm applies to several diversification models, including birth–death models and models that account for state- or time-dependent rates, as well as many commonly used models of discrete-trait evolution, and provides an alternative way to describe macroevolutionary model likelihoods. As a demonstration of our algorithm’s utility, we implemented it for a popular class of state-dependent diversification models—BiSSE, MuSSE, and their extensions to hidden-states. Our implementation is available through the R package $\texttt{castor}$. We show that, for these models, our algorithm is one or more orders of magnitude faster than existing implementations when applied to large phylogenies. Our algorithm thus enables the fitting of state-dependent diversification models to modern massive phylogenies with millions of tips and may lead to potentially similar computational improvements for many other macroevolutionary models.
Collapse
Affiliation(s)
- Stilianos Louca
- Department of Biology, 1210 University of Oregon, Eugene, OR 97403, USA
- Institute of Ecology and Evolution, 5289 University of Oregon, Eugene, OR 97403, USA
| | - Matthew W Pennell
- Biodiversity Research Centre, University of British Columbia, 2212 Main Mall, Vancouver, V6T1Z4 British Columbia, Canada
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, V6T1Z4 British Columbia, Canada
| |
Collapse
|