1
|
Thompson A, Liebeskind BJ, Scully EJ, Landis MJ. Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong. Syst Biol 2024; 73:183-206. [PMID: 38189575 DOI: 10.1093/sysbio/syad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 11/22/2023] [Accepted: 01/05/2024] [Indexed: 01/09/2024] Open
Abstract
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
Collapse
Affiliation(s)
- Ammon Thompson
- Participant in an Education Program Sponsored by U.S. Department of Defense (DOD) at the National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | | | - Erik J Scully
- National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, MO 63130, USA
| |
Collapse
|
2
|
Müller NF, Bouckaert RR, Wu CH, Bedford T. MASCOT-Skyline integrates population and migration dynamics to enhance phylogeographic reconstructions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583734. [PMID: 38496513 PMCID: PMC10942421 DOI: 10.1101/2024.03.06.583734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The spread of infectious diseases is shaped by spatial and temporal aspects, such as host population structure or changes in the transmission rate or number of infected individuals over time. These spatiotemporal dynamics are imprinted in the genome of pathogens and can be recovered from those genomes using phylodynamics methods. However, phylodynamic methods typically quantify either the temporal or spatial transmission dynamics, which leads to unclear biases, as one can potentially not be inferred without the other. Here, we address this challenge by introducing a structured coalescent skyline approach, MASCOT-Skyline that allows us to jointly infer spatial and temporal transmission dynamics of infectious diseases using Markov chain Monte Carlo inference. To do so, we model the effective population size dynamics in different locations using a non-parametric function, allowing us to approximate a range of population size dynamics. We show, using a range of different viral outbreak datasets, potential issues with phylogeographic methods. We then use these viral datasets to motivate simulations of outbreaks that illuminate the nature of biases present in the different phylogeographic methods. We show that spatial and temporal dynamics should be modeled jointly even if one seeks to recover just one of the two. Further, we showcase conditions under which we can expect phylogeographic analyses to be biased, particularly different subsampling approaches, as well as provide recommendations of when we can expect them to perform well. We implemented MASCOT-Skyline as part of the open-source software package MASCOT for the Bayesian phylodynamics platform BEAST2.
Collapse
Affiliation(s)
- Nicola F. Müller
- Division of HIV, ID and Global Medicine, University of California San Francisco, San Francisco, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
| | - Remco R. Bouckaert
- Centre for Computational Evolution, The University of Auckland, New Zealand
| | - Chieh-Hsi Wu
- School of Mathematical Sciences, University of Southampton, UK
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
- Howard Hughes Medical Institute, Seattle, USA
| |
Collapse
|
3
|
Soewongsono AC, Landis MJ. A Diffusion-Based Approach for Simulating Forward-in-Time State-Dependent Speciation and Extinction Dynamics. ARXIV 2024:arXiv:2402.00246v1. [PMID: 38351931 PMCID: PMC10862938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
We establish a general framework using a diffusion approximation to simulate forward-in-time state counts or frequencies for cladogenetic state-dependent speciation-extinction (ClaSSE) models. We apply the framework to various two- and three-region geographic-state speciation-extinction (GeoSSE) models. We show that the species range state dynamics simulated under tree-based and diffusion-based processes are comparable. We derive a method to infer rate parameters that are compatible with given observed stationary state frequencies and obtain an analytical result to compute stationary state frequencies for a given set of rate parameters. We also describe a procedure to find the time to reach the stationary frequencies of a ClaSSE model using our diffusion-based approach, which we demonstrate using a worked example for a two-region GeoSSE model. Finally, we discuss how the diffusion framework can be applied to formalize relationships between evolutionary patterns and processes under state-dependent diversification scenarios.
Collapse
Affiliation(s)
- Albert C Soewongsono
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, Missouri, 63130, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, Missouri, 63130, USA
| |
Collapse
|
4
|
Vaughan TG. ReMASTER: improved phylodynamic simulation for BEAST 2.7. Bioinformatics 2024; 40:btae015. [PMID: 38195927 PMCID: PMC10796175 DOI: 10.1093/bioinformatics/btae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/30/2023] [Accepted: 01/08/2024] [Indexed: 01/11/2024] Open
Abstract
SUMMARY Phylodynamic models link phylogenetic trees to biologically-relevant parameters such as speciation and extinction rates (macroevolution), effective population sizes and migration rates (ecology and phylogeography), and transmission and removal/recovery rates (epidemiology) to name a few. Being able to simulate phylogenetic trees and population dynamics under these models is the basis for (i) developing and testing of phylodynamic inference algorithms, (ii) performing simulation studies which quantify the biases stemming from model-misspecification, and (iii) performing so-called model adequacy assessments by simulating samples from the posterior predictive distribution. Here I introduce ReMASTER, a package for the phylogenetic inference platform BEAST 2 that provides a simple and efficient approach to specifying and simulating the phylogenetic trees and population dynamics arising from phylodynamic models. Being a component of BEAST 2 allows ReMASTER to also form the basis of joint simulation and inference analyses. ReMASTER is a complete rewrite of an earlier package, MASTER, and boasts improved efficiency, ease of use, flexibility of model specification, and deeper integration with BEAST 2. AVAILABILITY AND IMPLEMENTATION ReMASTER can be installed directly from the BEAST 2 package manager, and its documentation is available online at https://tgvaughan.github.io/remaster. ReMASTER is free software, and is distributed under version 3 of the GNU General Public License. The Java source code for ReMASTER is available from https://github.com/tgvaughan/remaster.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
5
|
Weber A, Översti S, Kühnert D. Reconstructing relative transmission rates in Bayesian phylodynamics: Two-fold transmission advantage of Omicron in Berlin, Germany during December 2021. Virus Evol 2023; 9:vead070. [PMID: 38107332 PMCID: PMC10725310 DOI: 10.1093/ve/vead070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 11/08/2023] [Accepted: 11/27/2023] [Indexed: 12/19/2023] Open
Abstract
Phylodynamic methods have lately played a key role in understanding the spread of infectious diseases. During the coronavirus disease (COVID-19) pandemic, large scale genomic surveillance has further increased the potential of dynamic inference from viral genomes. With the continual emergence of novel severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) variants, explicitly allowing transmission rate differences between simultaneously circulating variants in phylodynamic inference is crucial. In this study, we present and empirically validate an extension to the BEAST2 package birth-death skyline model (BDSKY), BDSKY[Formula: see text], which introduces a scaling factor for the transmission rate between independent, jointly inferred trees. In an extensive simulation study, we show that BDSKY[Formula: see text] robustly infers the relative transmission rates under different epidemic scenarios. Using publicly available genome data of SARS-CoV-2, we apply BDSKY[Formula: see text] to quantify the transmission advantage of the Omicron over the Delta variant in Berlin, Germany. We find the overall transmission rate of Omicron to be scaled by a factor of two with pronounced variation between the individual clusters of each variant. These results quantify the transmission advantage of Omicron over the previously circulating Delta variant, in a crucial period of pre-established non-pharmaceutical interventions. By inferring variant- as well as cluster-specific transmission rate scaling factors, we show the differences in transmission dynamics for each variant. This highlights the importance of incorporating lineage-specific transmission differences in phylodynamic inference.
Collapse
Affiliation(s)
- Ariane Weber
- Transmission, Infection, Diversification & Evolution Group (tide), Max Planck Institute of Geoanthropology, Kahlaische Strasse 10, Jena, Thuringia 07745, Germany
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig, Saxony 04103, Germany
| | | | - Denise Kühnert
- Transmission, Infection, Diversification & Evolution Group (tide), Max Planck Institute of Geoanthropology, Kahlaische Strasse 10, Jena, Thuringia 07745, Germany
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig, Saxony 04103, Germany
- Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Ludwig-Witthöft-Straße 14, Wildau, Brandenburg 15745, Germany
| |
Collapse
|
6
|
Johnson B, Shuai Y, Schweinsberg J, Curtius K. cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory. Bioinformatics 2023; 39:btad561. [PMID: 37699006 PMCID: PMC10534056 DOI: 10.1093/bioinformatics/btad561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/25/2023] [Indexed: 09/14/2023] Open
Abstract
MOTIVATION While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets. RESULTS We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026). AVAILABILITY AND IMPLEMENTATION We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.
Collapse
Affiliation(s)
- Brian Johnson
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Yubo Shuai
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, United States
| | - Jason Schweinsberg
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, United States
| | - Kit Curtius
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, United States
- VA San Diego Healthcare System, San Diego, CA 92161, United States
| |
Collapse
|
7
|
Lewinsohn MA, Bedford T, Müller NF, Feder AF. State-dependent evolutionary models reveal modes of solid tumour growth. Nat Ecol Evol 2023; 7:581-596. [PMID: 36894662 PMCID: PMC10089931 DOI: 10.1038/s41559-023-02000-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 01/26/2023] [Indexed: 03/11/2023]
Abstract
Spatial properties of tumour growth have profound implications for cancer progression, therapeutic resistance and metastasis. Yet, how spatial position governs tumour cell division remains difficult to evaluate in clinical tumours. Here, we demonstrate that faster division on the tumour periphery leaves characteristic genetic patterns, which become evident when a phylogenetic tree is reconstructed from spatially sampled cells. Namely, rapidly dividing peripheral lineages branch more extensively and acquire more mutations than slower-dividing centre lineages. We develop a Bayesian state-dependent evolutionary phylodynamic model (SDevo) that quantifies these patterns to infer the differential division rates between peripheral and central cells. We demonstrate that this approach accurately infers spatially varying birth rates of simulated tumours across a range of growth conditions and sampling strategies. We then show that SDevo outperforms state-of-the-art, non-cancer multi-state phylodynamic methods that ignore differential sequence evolution. Finally, we apply SDevo to single-time-point, multi-region sequencing data from clinical hepatocellular carcinomas and find evidence of a three- to six-times-higher division rate on the tumour edge. With the increasing availability of high-resolution, multi-region sequencing, we anticipate that SDevo will be useful in interrogating spatial growth restrictions and could be extended to model non-spatial factors that influence tumour progression.
Collapse
Affiliation(s)
- Maya A Lewinsohn
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Trevor Bedford
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Nicola F Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Alison F Feder
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Danesh G, Saulnier E, Gascuel O, Choisy M, Alizon S. TiPS
: Rapidly simulating trajectories and phylogenies from compartmental models. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.14038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Gonché Danesh
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
| | - Emma Saulnier
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
| | | | - Marc Choisy
- Centre for Tropical Medicine and Global Health Nuffield Department of Medicine, University of Oxford Oxford UK
- Oxford University Clinical Research Unit Ho Chi Minh City Vietnam
| | - Samuel Alizon
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
- Center for Interdisciplinary Research in Biology (CIRB) College de France, CNRS, INSERM, Université PSL Paris France
| |
Collapse
|
9
|
Didelot X, Helekal D, Kendall M, Ribeca P. Distinguishing imported cases from locally acquired cases within a geographically limited genomic sample of an infectious disease. Bioinformatics 2022; 39:6849542. [PMID: 36440957 PMCID: PMC9805578 DOI: 10.1093/bioinformatics/btac761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/17/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION The ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example, using a phylogeographic analysis in which genomic data from multiple locations are compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available. RESULTS Here, we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location, we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population. AVAILABILITY AND IMPLEMENTATION The R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImports. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, Coventry CV4 7AL, UK
| | - Michelle Kendall
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Paolo Ribeca
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London NW9 5EQ, UK,Biomathematics and Statistics Scotland, The James Hutton Institute, Edinburgh EH9 3FD, UK
| |
Collapse
|
10
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: Scalable viral genealogy simulator for global pandemic. PLoS Comput Biol 2022; 18:e1010409. [PMID: 36001646 PMCID: PMC9447924 DOI: 10.1371/journal.pcbi.1010409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 09/06/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. We develop a fast and flexible simulation software package VGsim for modeling epidemiological processes and generating genealogies of large pathogen samples. The software takes into account host population structure, pathogen evolution, host immunity and some other epidemiological aspects. The computational efficiency of the package allows to simulate genealogies of tens of millions of samples, which is important, e.g., for SARS-CoV-2 genome studies.
Collapse
Affiliation(s)
- Vladimir Shchur
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
- * E-mail:
| | - Vadim Spirin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | - Dmitry Sirotkin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California, United States of America
| |
Collapse
|
11
|
Robust Phylodynamic Analysis of Genetic Sequencing Data from Structured Populations. Viruses 2022; 14:v14081648. [PMID: 36016270 PMCID: PMC9413058 DOI: 10.3390/v14081648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 07/22/2022] [Indexed: 02/04/2023] Open
Abstract
The multi-type birth–death model with sampling is a phylodynamic model which enables the quantification of past population dynamics in structured populations based on phylogenetic trees. The BEAST 2 package bdmm implements an algorithm for numerically computing the probability density of a phylogenetic tree given the population dynamic parameters under this model. In the initial release of bdmm, analyses were computationally limited to trees consisting of up to approximately 250 genetic samples. We implemented important algorithmic changes to bdmm which dramatically increased the number of genetic samples that could be analyzed and which improved the numerical robustness and efficiency of the calculations. Including more samples led to the improved precision of parameter estimates, particularly for structured models with a high number of inferred parameters. Furthermore, we report on several model extensions to bdmm, inspired by properties common to empirical datasets. We applied this improved algorithm to two partly overlapping datasets of the Influenza A virus HA sequences sampled around the world—one with 500 samples and the other with only 175—for comparison. We report and compare the global migration patterns and seasonal dynamics inferred from each dataset. In this way, we show the information that is gained by analyzing the bigger dataset, which became possible with the presented algorithmic changes to bdmm. In summary, bdmm allows for the robust, faster, and more general phylodynamic inference of larger datasets.
Collapse
|
12
|
Menardo F. Understanding drivers of phylogenetic clustering and terminal branch lengths distribution in epidemics of Mycobacterium tuberculosis. eLife 2022; 11:76780. [PMID: 35762734 PMCID: PMC9239681 DOI: 10.7554/elife.76780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 06/15/2022] [Indexed: 11/13/2022] Open
Abstract
Detecting factors associated with transmission is important to understand disease epidemics, and to design effective public health measures. Clustering and terminal branch lengths (TBL) analyses are commonly applied to genomic data sets of Mycobacterium tuberculosis (MTB) to identify sub-populations with increased transmission. Here, I used a simulation-based approach to investigate what epidemiological processes influence the results of clustering and TBL analyses, and whether differences in transmission can be detected with these methods. I simulated MTB epidemics with different dynamics (latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, sampling period, and molecular clock), and found that all considered factors, except for the length of the infectious period, affect the results of clustering and TBL distributions. I show that standard interpretations of this type of analyses ignore two main caveats: (1) clustering results and TBL depend on many factors that have nothing to do with transmission, (2) clustering results and TBL do not tell anything about whether the epidemic is stable, growing, or shrinking, unless all the additional parameters that influence these metrics are known, or assumed identical between sub-populations. An important consequence is that the optimal SNP threshold for clustering depends on the epidemiological conditions, and that sub-populations with different epidemiological characteristics should not be analyzed with the same threshold. Finally, these results suggest that different clustering rates and TBL distributions, that are found consistently between different MTB lineages, are probably due to intrinsic bacterial factors, and do not indicate necessarily differences in transmission or evolutionary success.
Collapse
Affiliation(s)
- Fabrizio Menardo
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
13
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: scalable viral genealogy simulator for global pandemic. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.04.21.21255891. [PMID: 33948608 PMCID: PMC8095227 DOI: 10.1101/2021.04.21.21255891] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. The code is freely available at https://github.com/Genomics-HSE/VGsim.
Collapse
Affiliation(s)
| | | | | | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Russell Corbett-Detig
- HSE University, Russian Federation
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California 95064
| |
Collapse
|
14
|
Featherstone LA, Di Giallonardo F, Holmes EC, Vaughan TG, Duchêne S. Infectious disease phylodynamics with occurrence data. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13620] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Leo A. Featherstone
- Department of Microbiology and Immunology Peter Doherty Institute for Infection and Immunity University of Melbourne Melbourne Vic. Australia
| | | | - Edward C. Holmes
- Marie Bashir Institute for Infectious Diseases and BiosecurityThe University of Sydney Sydney NSW Australia
- Charles Perkins Centre School of Life and Environmental Sciences The University of Sydney Sydney NSW Australia
- School of Medical Sciences The University of Sydney Sydney NSW Australia
| | - Timothy G. Vaughan
- Department of Biosystems Science and Engineering ETH Zurich Basel Switzerland
- Swiss Institute of Bioinformatics (SIB) Lausanne Switzerland
| | - Sebastián Duchêne
- Department of Microbiology and Immunology Peter Doherty Institute for Infection and Immunity University of Melbourne Melbourne Vic. Australia
| |
Collapse
|
15
|
Müller NF, Wagner C, Frazar CD, Roychoudhury P, Lee J, Moncla LH, Pelle B, Richardson M, Ryke E, Xie H, Shrestha L, Addetia A, Rachleff VM, Lieberman NAP, Huang ML, Gautom R, Melly G, Hiatt B, Dykema P, Adler A, Brandstetter E, Han PD, Fay K, Ilcisin M, Lacombe K, Sibley TR, Truong M, Wolf CR, Boeckh M, Englund JA, Famulare M, Lutz BR, Rieder MJ, Thompson M, Duchin JS, Starita LM, Chu HY, Shendure J, Jerome KR, Lindquist S, Greninger AL, Nickerson DA, Bedford T. Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington State. Sci Transl Med 2021; 13:eabf0202. [PMID: 33941621 PMCID: PMC8158963 DOI: 10.1126/scitranslmed.abf0202] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/23/2021] [Accepted: 04/25/2021] [Indexed: 12/16/2022]
Abstract
The rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has gravely affected societies around the world. Outbreaks in different parts of the globe have been shaped by repeated introductions of new viral lineages and subsequent local transmission of those lineages. Here, we sequenced 3940 SARS-CoV-2 viral genomes from Washington State (USA) to characterize how the spread of SARS-CoV-2 in Washington State in early 2020 was shaped by differences in timing of mitigation strategies across counties and by repeated introductions of viral lineages into the state. In addition, we show that the increase in frequency of a potentially more transmissible viral variant (614G) over time can potentially be explained by regional mobility differences and multiple introductions of 614G but not the other variant (614D) into the state. At an individual level, we observed evidence of higher viral loads in patients infected with the 614G variant. However, using clinical records data, we did not find any evidence that the 614G variant affects clinical severity or patient outcomes. Overall, this suggests that with regard to D614G, the behavior of individuals has been more important in shaping the course of the pandemic in Washington State than this variant of the virus.
Collapse
Affiliation(s)
- Nicola F Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | - Cassia Wagner
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Chris D Frazar
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Pavitra Roychoudhury
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Jover Lee
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Louise H Moncla
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Benjamin Pelle
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Matthew Richardson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Erica Ryke
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Hong Xie
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Lasata Shrestha
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Amin Addetia
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Victoria M Rachleff
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Nicole A P Lieberman
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Meei-Li Huang
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Romesh Gautom
- Washington State Department of Health, Shoreline, WA 98155, USA
| | - Geoff Melly
- Washington State Department of Health, Shoreline, WA 98155, USA
| | - Brian Hiatt
- Washington State Department of Health, Shoreline, WA 98155, USA
| | - Philip Dykema
- Washington State Department of Health, Shoreline, WA 98155, USA
| | - Amanda Adler
- Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Elisabeth Brandstetter
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA 98195, USA
| | - Peter D Han
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Kairsten Fay
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Misja Ilcisin
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Kirsten Lacombe
- Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Thomas R Sibley
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Melissa Truong
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Caitlin R Wolf
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA 98195, USA
| | - Michael Boeckh
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Janet A Englund
- Seattle Children's Research Institute, Seattle, WA 98101, USA
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | | | - Barry R Lutz
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
- Department of Bioengineering, University of Washington, Seattle, WA 98105, USA
| | - Mark J Rieder
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Matthew Thompson
- Department of Global Health, University of Washington, Seattle, WA 98195, USA
| | - Jeffrey S Duchin
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA 98195, USA
- Public Health - Seattle & King County, Seattle, WA98121, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Helen Y Chu
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Keith R Jerome
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Scott Lindquist
- Washington State Department of Health, Shoreline, WA 98155, USA
| | - Alexander L Greninger
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| |
Collapse
|
16
|
Quantifying transmission fitness costs of multi-drug resistant tuberculosis. Epidemics 2021; 36:100471. [PMID: 34256273 DOI: 10.1016/j.epidem.2021.100471] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 01/14/2020] [Accepted: 05/17/2021] [Indexed: 11/22/2022] Open
Abstract
As multi-drug resistant tuberculosis (MDR-TB) continues to spread, investigating the transmission potential of different drug-resistant strains becomes an ever more pressing topic in public health. While phylogenetic and transmission tree inferences provide valuable insight into possible transmission chains, phylodynamic inference combines evolutionary and epidemiological analyses to estimate the parameters of the underlying epidemiological processes, allowing us to describe the overall dynamics of disease spread in the population. In this study, we introduce an approach to Mycobacterium tuberculosis (M. tuberculosis) phylodynamic analysis employing an existing computationally efficient model to quantify the transmission fitness costs of drug resistance with respect to drug-sensitive strains. To determine the accuracy and precision of our approach, we first perform a simulation study, mimicking the simultaneous spread of drug-sensitive and drug-resistant tuberculosis (TB) strains. We analyse the simulated transmission trees using the phylodynamic multi-type birth-death model (MTBD, (Kühnert et al., 2016)) within the BEAST2 framework and show that this model can estimate the parameters of the epidemic well, despite the simplifying assumptions that MTBD makes compared to the complex TB transmission dynamics used for simulation. We then apply the MTBD model to an M. tuberculosis lineage 4 dataset that primarily consists of MDR sequences. Some of the MDR strains additionally exhibit resistance to pyrazinamide - an important first-line anti-tuberculosis drug. Our results support the previously proposed hypothesis that pyrazinamide resistance confers a transmission fitness cost to the bacterium, which we quantify for the given dataset. Importantly, our sensitivity analyses show that the estimates are robust to different prior distributions on the resistance acquisition rate, but are affected by the size of the dataset - i.e. we estimate a higher fitness cost when using fewer sequences for analysis. Overall, we propose that MTBD can be used to quantify the transmission fitness cost for a wide range of pathogens where the strains can be appropriately divided into two or more categories with distinct properties.
Collapse
|
17
|
Parag KV, Pybus OG, Wu CH. Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions? Syst Biol 2021; 71:121-138. [PMID: 33989428 PMCID: PMC8677568 DOI: 10.1093/sysbio/syab037] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 05/07/2021] [Accepted: 05/08/2021] [Indexed: 11/13/2022] Open
Abstract
In Bayesian phylogenetics, the coalescent process provides an informative framework for inferring changes in the effective size of a population from a phylogeny (or tree) of sequences sampled from that population. Popular coalescent inference approaches such as the Bayesian Skyline Plot, Skyride, and Skygrid all model these population size changes with a discontinuous, piecewise-constant function but then apply a smoothing prior to ensure that their posterior population size estimates transition gradually with time. These prior distributions implicitly encode extra population size information that is not available from the observed coalescent data or tree. Here, we present a novel statistic, $\Omega$, to quantify and disaggregate the relative contributions of the coalescent data and prior assumptions to the resulting posterior estimate precision. Our statistic also measures the additional mutual information introduced by such priors. Using $\Omega$ we show that, because it is surprisingly easy to overparametrize piecewise-constant population models, common smoothing priors can lead to overconfident and potentially misleading inference, even under robust experimental designs. We propose $\Omega$ as a useful tool for detecting when effective population size estimates are overly reliant on prior assumptions and for improving quantification of the uncertainty in those estimates.[Coalescent processes; effective population size; information theory; phylodynamics; prior assumptions; skyline plots.].
Collapse
Affiliation(s)
- Kris V Parag
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London W2 1PG, UK,Department of Zoology, University of Oxford, Oxford OX1 3SY, UK,Correspondence to be sent to: MRC Centre for Global Infectious Disease Analysis, Imperial College London, London W2 1PG, UK; e-mail:
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| | - Chieh-Hsi Wu
- Mathematical Sciences, University of Southampton, Highfield, Southampton SO17 1BJ, UK
| |
Collapse
|
18
|
Duchene S, Lemey P, Stadler T, Ho SYW, Duchene DA, Dhanasekaran V, Baele G. Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations. Mol Biol Evol 2021; 37:3363-3379. [PMID: 32895707 PMCID: PMC7454806 DOI: 10.1093/molbev/msaa163] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples. A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population. Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data. The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous). We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation. We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species. Our results indicate that BETS is an effective alternative to other tests of temporal signal. In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.
Collapse
Affiliation(s)
- Sebastian Duchene
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC, Australia
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Zürich, Switzerland
| | - Simon Y W Ho
- Swiss Institute of Bioinformatics, Basel, Switzerland.,School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - David A Duchene
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Vijaykrishna Dhanasekaran
- Department of Microbiology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
19
|
Stadler T, Pybus OG, Stumpf MPH. Phylodynamics for cell biologists. Science 2021; 371:371/6526/eaah6266. [PMID: 33446527 DOI: 10.1126/science.aah6266] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 08/13/2020] [Indexed: 12/12/2022]
Abstract
Multicellular organisms are composed of cells connected by ancestry and descent from progenitor cells. The dynamics of cell birth, death, and inheritance within an organism give rise to the fundamental processes of development, differentiation, and cancer. Technical advances in molecular biology now allow us to study cellular composition, ancestry, and evolution at the resolution of individual cells within an organism or tissue. Here, we take a phylogenetic and phylodynamic approach to single-cell biology. We explain how "tree thinking" is important to the interpretation of the growing body of cell-level data and how ecological null models can benefit statistical hypothesis testing. Experimental progress in cell biology should be accompanied by theoretical developments if we are to exploit fully the dynamical information in single-cell data.
Collapse
Affiliation(s)
- T Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - O G Pybus
- Department of Zoology, University of Oxford, Oxford, UK.
| | - M P H Stumpf
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| |
Collapse
|
20
|
The Impacts of Low Diversity Sequence Data on Phylodynamic Inference during an Emerging Epidemic. Viruses 2021; 13:v13010079. [PMID: 33430050 PMCID: PMC7826997 DOI: 10.3390/v13010079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/05/2021] [Accepted: 01/05/2021] [Indexed: 01/06/2023] Open
Abstract
Phylodynamic inference is a pivotal tool in understanding transmission dynamics of viral outbreaks. These analyses are strongly guided by the input of an epidemiological model as well as sequence data that must contain sufficient intersequence variability in order to be informative. These criteria, however, may not be met during the early stages of an outbreak. Here we investigate the impact of low diversity sequence data on phylodynamic inference using the birth–death and coalescent exponential models. Through our simulation study, estimating the molecular evolutionary rate required enough sequence diversity and is an essential first step for any phylodynamic inference. Following this, the birth–death model outperforms the coalescent exponential model in estimating epidemiological parameters, when faced with low diversity sequence data due to explicitly exploiting the sampling times. In contrast, the coalescent model requires additional samples and therefore variability in sequence data before accurate estimates can be obtained. These findings were also supported through our empirical data analyses of an Australian and a New Zealand cluster outbreaks of SARS-CoV-2. Overall, the birth–death model is more robust when applied to datasets with low sequence diversity given sampling is specified and this should be considered for future viral outbreak investigations.
Collapse
|
21
|
Volz EM, Carsten W, Grad YH, Frost SDW, Dennis AM, Didelot X. Identification of Hidden Population Structure in Time-Scaled Phylogenies. Syst Biol 2021; 69:884-896. [PMID: 32049340 PMCID: PMC8559910 DOI: 10.1093/sysbio/syaa009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 01/09/2020] [Accepted: 01/23/2020] [Indexed: 11/13/2022] Open
Abstract
Population structure influences genealogical patterns, however, data pertaining to how populations are structured are often unavailable or not directly observable. Inference of population structure is highly important in molecular epidemiology where pathogen phylogenetics is increasingly used to infer transmission patterns and detect outbreaks. Discrepancies between observed and idealized genealogies, such as those generated by the coalescent process, can be quantified, and where significant differences occur, may reveal the action of natural selection, host population structure, or other demographic and epidemiological heterogeneities. We have developed a fast non-parametric statistical test for detection of cryptic population structure in time-scaled phylogenetic trees. The test is based on contrasting estimated phylogenies with the theoretically expected phylodynamic ordering of common ancestors in two clades within a coalescent framework. These statistical tests have also motivated the development of algorithms which can be used to quickly screen a phylogenetic tree for clades which are likely to share a distinct demographic or epidemiological history. Epidemiological applications include identification of outbreaks in vulnerable host populations or rapid expansion of genotypes with a fitness advantage. To demonstrate the utility of these methods for outbreak detection, we applied the new methods to large phylogenies reconstructed from thousands of HIV-1 partial pol sequences. This revealed the presence of clades which had grown rapidly in the recent past and was significantly concentrated in young men, suggesting recent and rapid transmission in that group. Furthermore, to demonstrate the utility of these methods for the study of antimicrobial resistance, we applied the new methods to a large phylogeny reconstructed from whole genome Neisseria gonorrhoeae sequences. We find that population structure detected using these methods closely overlaps with the appearance and expansion of mutations conferring antimicrobial resistance. [Antimicrobial resistance; coalescent; HIV; population structure.].
Collapse
Affiliation(s)
- Erik M Volz
- Department of Infectious Disease Epidemiology and MRC Centre for Global Infectious Disease Analysis, Imperial College London, Norfolk Place, W2 1PG London, UK
| | - Wiuf Carsten
- Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen, Denmark
| | - Yonatan H Grad
- Department of Immunology and Infectious Diseases, TH Chan School of Public Health, Harvard University, 677 Huntington Ave, Boston, MA 02115, USA
| | - Simon D W Frost
- Department of Veterinary Medicine, University of Cambridge, Madingley Rd, Cambridge CB3 0ES, UK.,The Alan Turing Institute, 96 Euston Rd, London NW1 2DB, London, UK
| | - Ann M Dennis
- Department of Medicine, University of North Carolina Chapel Hill, 321 S Columbia St, Chapel Hill, NC 27516, USA
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK
| |
Collapse
|
22
|
Müller NF, Wagner C, Frazar CD, Roychoudhury P, Lee J, Moncla LH, Pelle B, Richardson M, Ryke E, Xie H, Shrestha L, Addetia A, Rachleff VM, Lieberman NAP, Huang ML, Gautom R, Melly G, Hiatt B, Dykema P, Adler A, Brandstetter E, Han PD, Fay K, Llcisin M, Lacombe K, Sibley TR, Truong M, Wolf CR, Boeckh M, Englund JA, Famulare M, Lutz BR, Rieder MJ, Thompson M, Duchin JS, Starita LM, Chu HY, Shendure J, Jerome KR, Lindquist S, Greninger AL, Nickerson DA, Bedford T. Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington State. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.09.30.20204230. [PMID: 33024981 PMCID: PMC7536883 DOI: 10.1101/2020.09.30.20204230] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The rapid spread of SARS-CoV-2 has gravely impacted societies around the world. Outbreaks in different parts of the globe are shaped by repeated introductions of new lineages and subsequent local transmission of those lineages. Here, we sequenced 3940 SARS-CoV-2 viral genomes from Washington State to characterize how the spread of SARS-CoV-2 in Washington State (USA) was shaped by differences in timing of mitigation strategies across counties, as well as by repeated introductions of viral lineages into the state. Additionally, we show that the increase in frequency of a potentially more transmissible viral variant (614G) over time can potentially be explained by regional mobility differences and multiple introductions of 614G, but not the other variant (614D) into the state. At an individual level, we see evidence of higher viral loads in patients infected with the 614G variant. However, using clinical records data, we do not find any evidence that the 614G variant impacts clinical severity or patient outcomes. Overall, this suggests that at least to date, the behavior of individuals has been more important in shaping the course of the pandemic than changes in the virus.
Collapse
Affiliation(s)
| | - Cassia Wagner
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| | | | - Pavitra Roychoudhury
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| | - Jover Lee
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | | | - Erica Ryke
- University of Washington, Seattle, WA, USA
| | - Hong Xie
- University of Washington, Seattle, WA, USA
| | | | | | - Victoria M Rachleff
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| | | | | | - Romesh Gautom
- Washington State Department of Health, Shoreline, WA, USA
| | - Geoff Melly
- Washington State Department of Health, Shoreline, WA, USA
| | - Brian Hiatt
- Washington State Department of Health, Shoreline, WA, USA
| | - Philip Dykema
- Washington State Department of Health, Shoreline, WA, USA
| | - Amanda Adler
- Seattle Children's Research Institute, Seattle, WA, USA
| | | | | | - Kairsten Fay
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Misja Llcisin
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | | | | | - Michael Boeckh
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Janet A Englund
- University of Washington, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| | | | - Barry R Lutz
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Mark J Rieder
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Jeffrey S Duchin
- University of Washington, Seattle, WA, USA
- Public Health - Seattle & King County, Seattle, WA, USA
| | - Lea M Starita
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Helen Y Chu
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Jay Shendure
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Keith R Jerome
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| | | | - Alexander L Greninger
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| | - Deborah A Nickerson
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Trevor Bedford
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| |
Collapse
|
23
|
Müller NF, Rasmussen D, Stadler T. MASCOT: parameter and state inference under the marginal structured coalescent approximation. Bioinformatics 2019; 34:3843-3848. [PMID: 29790921 PMCID: PMC6223361 DOI: 10.1093/bioinformatics/bty406] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/16/2018] [Indexed: 11/16/2022] Open
Abstract
Motivation The structured coalescent is widely applied to study demography within and migration between sub-populations from genetic sequence data. Current methods are either exact but too computationally inefficient to analyse large datasets with many sub-populations, or make strong approximations leading to severe biases in inference. We recently introduced an approximation based on weaker assumptions to the structured coalescent enabling the analysis of larger datasets with many different states. We showed that our approximation provides unbiased migration rate and population size estimates across a wide parameter range. Results We extend this approach by providing a new algorithm to calculate the probability of the state of internal nodes that includes the information from the full phylogenetic tree. We show that this algorithm is able to increase the probability attributed to the true sub-population of a node. Furthermore we use improved integration techniques, such that our method is now able to analyse larger datasets, including a H3N2 dataset with 433 sequences sampled from five different locations. Availability and implementation The presented methods are part of the BEAST2 package MASCOT, the Marginal Approximation of the Structured COalescenT. This package can be downloaded via the BEAUti package manager. The source code is available at https://github.com/nicfel/Mascot.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicola F Müller
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - David Rasmussen
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
24
|
Müller NF, Dudas G, Stadler T. Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations. Virus Evol 2019; 5:vez030. [PMID: 31428459 PMCID: PMC6693038 DOI: 10.1093/ve/vez030] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.
Collapse
Affiliation(s)
- Nicola F Müller
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Gytis Dudas
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
25
|
Bloomfield S, Vaughan T, Benschop J, Marshall J, Hayman D, Biggs P, Carter P, French N. Investigation of the validity of two Bayesian ancestral state reconstruction models for estimating Salmonella transmission during outbreaks. PLoS One 2019; 14:e0214169. [PMID: 31329588 PMCID: PMC6645465 DOI: 10.1371/journal.pone.0214169] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 07/08/2019] [Indexed: 01/24/2023] Open
Abstract
Ancestral state reconstruction models use genetic data to characterize a group of organisms’ common ancestor. These models have been applied to salmonellosis outbreaks to estimate the number of transmissions between different animal species that share similar geographical locations, with animal host as the state. However, as far as we are aware, no studies have validated these models for outbreak analysis. In this study, salmonellosis outbreaks were simulated using a stochastic Susceptible-Infected-Recovered model, and the host population and transmission parameters of these simulated outbreaks were estimated using Bayesian ancestral state reconstruction models (discrete trait analysis (DTA) and structured coalescent (SC)). These models were unable to accurately estimate the number of transmissions between the host populations or the amount of time spent in each host population. The DTA model was inaccurate because it assumed the number of isolates sampled from each host population was proportional to the number of individuals infected within each host population. The SC model was inaccurate possibly because it assumed that each host population's effective population size was constant over the course of the simulated outbreaks. This study highlights the need for phylodynamic models that can take into consideration factors that influence the characteristics and behavior of outbreaks, e.g. changing effective population sizes, variation in infectious periods, intra-population transmissions, and disproportionate sampling of infected individuals.
Collapse
Affiliation(s)
- Samuel Bloomfield
- Quadram Institute, Norwich Research Park, Colney Lane, Norwich, United Kingdom
- * E-mail:
| | - Timothy Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
| | - Jackie Benschop
- Molecular Epidemiology and Public Health Laboratory, Massey University, Palmerston North, New Zealand
| | - Jonathan Marshall
- Molecular Epidemiology and Public Health Laboratory, Massey University, Palmerston North, New Zealand
| | - David Hayman
- Molecular Epidemiology and Public Health Laboratory, Massey University, Palmerston North, New Zealand
| | - Patrick Biggs
- Molecular Epidemiology and Public Health Laboratory, Massey University, Palmerston North, New Zealand
| | - Philip Carter
- Institute of Environmental Science and Research, Keneperu, New Zealand
| | - Nigel French
- Molecular Epidemiology and Public Health Laboratory, Massey University, Palmerston North, New Zealand
| |
Collapse
|
26
|
Duchene S, Bouckaert R, Duchene DA, Stadler T, Drummond AJ. Phylodynamic Model Adequacy Using Posterior Predictive Simulations. Syst Biol 2019; 68:358-364. [PMID: 29945220 PMCID: PMC6368481 DOI: 10.1093/sysbio/syy048] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 06/15/2018] [Indexed: 11/18/2022] Open
Abstract
Rapidly evolving pathogens, such as viruses and bacteria, accumulate genetic change at a similar timescale over which their epidemiological processes occur, such that, it is possible to make inferences about their infectious spread using phylogenetic time-trees. For this purpose it is necessary to choose a phylodynamic model. However, the resulting inferences are contingent on whether the model adequately describes key features of the data. Model adequacy methods allow formal rejection of a model if it cannot generate the main features of the data. We present TreeModelAdequacy, a package for the popular BEAST2 software that allows assessing the adequacy of phylodynamic models. We illustrate its utility by analyzing phylogenetic trees from two viral outbreaks of Ebola and H1N1 influenza. The main features of the Ebola data were adequately described by the coalescent exponential-growth model, whereas the H1N1 influenza data were best described by the birth–death susceptible-infected-recovered model.
Collapse
Affiliation(s)
- Sebastian Duchene
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, Australia
| | - Remco Bouckaert
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Max Planck Institute for the Science of Human History, Jena, Germany
| | - David A Duchene
- School of Life and Environmental Sciences, University of Sydney, Sydney, Australia
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
27
|
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019; 15:e1006650. [PMID: 30958812 PMCID: PMC6472827 DOI: 10.1371/journal.pcbi.1006650] [Citation(s) in RCA: 1552] [Impact Index Per Article: 310.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/18/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Collapse
Affiliation(s)
- Remco Bouckaert
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Timothy G. Vaughan
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joëlle Barido-Sottani
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | | | | | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden
| | - Denise Kühnert
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK
| | - Michael Matschiner
- Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland
| | - Fábio K. Mendes
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Nicola F. Müller
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, TX 77005-1892, USA
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK
| | - Alex Popinga
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| | - David Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chieh-Hsi Wu
- Department of Statistics, University of Oxford, OX1 3LB, UK
| | - Dong Xie
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Chi Zhang
- Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Tanja Stadler
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J. Drummond
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
28
|
Magiorkinis G, Karamitros T, Vasylyeva TI, Williams LD, Mbisa JL, Hatzakis A, Paraskevis D, Friedman SR. An Innovative Study Design to Assess the Community Effect of Interventions to Mitigate HIV Epidemics Using Transmission-Chain Phylodynamics. Am J Epidemiol 2018; 187:2615-2622. [PMID: 30101288 DOI: 10.1093/aje/kwy160] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 07/24/2018] [Indexed: 11/13/2022] Open
Abstract
Given globalization and other social phenomena, controlling the spread of infectious diseases has become an imperative public health priority. A plethora of interventions that in theory can mitigate the spread of pathogens have been proposed and applied. Evaluating the effectiveness of such interventions is costly and in many circumstances unrealistic. Most important, the community effect (i.e., the ability of the intervention to minimize the spread of the pathogen from people who received the intervention to other community members) can rarely be evaluated. Here we propose a study design that can build and evaluate evidence in support of the community effect of an intervention. The approach exploits molecular evolutionary dynamics of pathogens in order to track new infections as having arisen from either a control or an intervention group. It enables us to evaluate whether an intervention reduces the number and length of new transmission chains in comparison with a control condition, and thus lets us estimate the relative decrease in new infections in the community due to the intervention. We provide as an example one working scenario of a way the approach can be applied with a simulation study and associated power calculations.
Collapse
Affiliation(s)
- Gkikas Magiorkinis
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | | | | | | | - Jean L Mbisa
- Virus Reference Department, Public Health England, London, United Kingdom
| | - Angelos Hatzakis
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Dimitrios Paraskevis
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | | |
Collapse
|
29
|
Volz EM, Didelot X. Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance. Syst Biol 2018; 67:719-728. [PMID: 29432602 PMCID: PMC6005154 DOI: 10.1093/sysbio/syy007] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 02/04/2018] [Indexed: 12/15/2022] Open
Abstract
Nonparametric population genetic modeling provides a simple and flexible approach for studying demographic history and epidemic dynamics using pathogen sequence data. Existing Bayesian approaches are premised on stochastic processes with stationary increments which may provide an unrealistic prior for epidemic histories which feature extended period of exponential growth or decline. We show that nonparametric models defined in terms of the growth rate of the effective population size can provide a more realistic prior for epidemic history. We propose a nonparametric autoregressive model on the growth rate as a prior for effective population size, which corresponds to the dynamics expected under many epidemic situations. We demonstrate the use of this model within a Bayesian phylodynamic inference framework. Our method correctly reconstructs trends of epidemic growth and decline from pathogen genealogies even when genealogical data are sparse and conventional skyline estimators erroneously predict stable population size. We also propose a regression approach for relating growth rates of pathogen effective population size and time-varying variables that may impact the replicative fitness of a pathogen. The model is applied to real data from rabies virus and Staphylococcus aureus epidemics. We find a close correspondence between the estimated growth rates of a lineage of methicillin-resistant S. aureus and population-level prescription rates of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\beta$\end{document}-lactam antibiotics. The new models are implemented in an open source R package called skygrowth which is available at https://github.com/mrc-ide/skygrowth.
Collapse
Affiliation(s)
- Erik M Volz
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| |
Collapse
|
30
|
Vaughan TG. IcyTree: rapid browser-based visualization for phylogenetic trees and networks. Bioinformatics 2018; 33:2392-2394. [PMID: 28407035 PMCID: PMC5860111 DOI: 10.1093/bioinformatics/btx155] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Accepted: 03/21/2017] [Indexed: 01/25/2023] Open
Abstract
Summary IcyTree is an easy-to-use application which can be used to visualize a wide variety of phylogenetic trees and networks. While numerous phylogenetic tree viewers exist already, IcyTree distinguishes itself by being a purely online tool, having a responsive user interface, supporting phylogenetic networks (ancestral recombination graphs in particular), and efficiently drawing trees that include information such as ancestral locations or trait values. IcyTree also provides intuitive panning and zooming utilities that make exploring large phylogenetic trees of many thousands of taxa feasible. Availability and Implementation IcyTree is a web application and can be accessed directly at http://tgvaughan.github.com/icytree . Currently supported web browsers include Mozilla Firefox and Google Chrome. IcyTree is written entirely in client-side JavaScript (no plugin required) and, once loaded, does not require network access to run. IcyTree is free software, and the source code is made available at http://github.com/tgvaughan/icytree under version 3 of the GNU General Public License. Contact tgvaughan@gmail.com.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
31
|
Abstract
Phylogeographic methods can help reveal the movement of genes between populations of organisms. This has been widely done to quantify pathogen movement between different host populations, the migration history of humans, and the geographic spread of languages or gene flow between species using the location or state of samples alongside sequence data. Phylogenies therefore offer insights into migration processes not available from classic epidemiological or occurrence data alone. Phylogeographic methods have however several known shortcomings. In particular, one of the most widely used methods treats migration the same as mutation, and therefore does not incorporate information about population demography. This may lead to severe biases in estimated migration rates for data sets where sampling is biased across populations. The structured coalescent on the other hand allows us to coherently model the migration and coalescent process, but current implementations struggle with complex data sets due to the need to infer ancestral migration histories. Thus, approximations to the structured coalescent, which integrate over all ancestral migration histories, have been developed. However, the validity and robustness of these approximations remain unclear. We present an exact numerical solution to the structured coalescent that does not require the inference of migration histories. Although this solution is computationally unfeasible for large data sets, it clarifies the assumptions of previously developed approximate methods and allows us to provide an improved approximation to the structured coalescent. We have implemented these methods in BEAST2, and we show how these methods compare under different scenarios.
Collapse
Affiliation(s)
- Nicola F Müller
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - David A Rasmussen
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
32
|
Boskova V, Stadler T, Magnus C. The influence of phylodynamic model specifications on parameter estimates of the Zika virus epidemic. Virus Evol 2018; 4:vex044. [PMID: 29403651 PMCID: PMC5789282 DOI: 10.1093/ve/vex044] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Each new virus introduced into the human population could potentially spread and cause a worldwide epidemic. Thus, early quantification of epidemic spread is crucial. Real-time sequencing followed by Bayesian phylodynamic analysis has proven to be extremely informative in this respect. Bayesian phylodynamic analyses require a model to be chosen and prior distributions on model parameters to be specified. We study here how choices regarding the tree prior influence quantification of epidemic spread in an emerging epidemic by focusing on estimates of the parameters clock rate, tree height, and reproductive number in the currently ongoing Zika virus epidemic in the Americas. While parameter estimates are quite robust to reasonable variations in the model settings when studying the complete data set, it is impossible to obtain unequivocal estimates when reducing the data to local Zika epidemics in Brazil and Florida, USA. Beyond the empirical insights, this study highlights the conceptual differences between the so-called birth-death and coalescent tree priors: while sequence sampling times alone can strongly inform the tree height and reproductive number under a birth-death model, the coalescent tree height prior is typically only slightly influenced by this information. Such conceptual differences together with non-trivial interactions of different priors complicate proper interpretation of empirical results. Overall, our findings indicate that phylodynamic analyses of early viral spread data must be carried out with care as data sets may not necessarily be informative enough yet to provide estimates robust to prior settings. It is necessary to do a robustness check of these data sets by scanning several models and prior distributions. Only if the posterior distributions are robust to reasonable changes of the prior distribution, the parameter estimates can be trusted. Such robustness tests will help making real-time phylodynamic analyses of spreading epidemic more reliable in the future.
Collapse
Affiliation(s)
- Veronika Boskova
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse, 4058 Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse, 4058 Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Carsten Magnus
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse, 4058 Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
33
|
McCloskey RM, Poon AFY. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation. PLoS Comput Biol 2017; 13:e1005868. [PMID: 29131825 PMCID: PMC5703573 DOI: 10.1371/journal.pcbi.1005868] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 11/27/2017] [Accepted: 11/02/2017] [Indexed: 01/07/2023] Open
Abstract
Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis—where individuals are sampled sooner post-infection—rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources. Many pathogens evolve so rapidly that they accumulate genetic differences within a host before becoming transmitted to the next host. Consequently, clusters of sampled infections with nearly identical genomes may reveal outbreaks of recent or ongoing transmissions. There is rapidly growing interest in using model-free genetic clustering methods to guide public health responses to epidemics in near real-time, including HIV, Ebola virus and tuberculosis. However, we show that current methods are relatively ineffective at detecting transmission outbreaks; instead, they are predominantly influenced by how infections are sampled from the population. We describe a fundamentally new approach to genetic clustering that is based on modelling changes in transmission rates during the spread of the epidemic. We use simulated and real pathogen sequence data sets to demonstrate that this model-based approach is substantially more effective for detecting transmission outbreaks, and remains fast enough for real-time applications to large sequence databases.
Collapse
Affiliation(s)
| | - Art F. Y. Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- Department of Microbiology and Immunology, Western University, London, Ontario, Canada
- Department of Applied Mathematics, Western University, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
34
|
Dearlove BL, Xiang F, Frost SDW. Biased phylodynamic inferences from analysing clusters of viral sequences. Virus Evol 2017; 3:vex020. [PMID: 28852573 PMCID: PMC5570026 DOI: 10.1093/ve/vex020] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a ‘power law’ behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified ‘transmission cluster’ of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources.
Collapse
Affiliation(s)
- Bethany L Dearlove
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK
| | - Fei Xiang
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK
| | - Simon D W Frost
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK
| |
Collapse
|
35
|
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study. PLoS Comput Biol 2017; 13:e1005416. [PMID: 28263987 PMCID: PMC5358897 DOI: 10.1371/journal.pcbi.1005416] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 03/20/2017] [Accepted: 02/16/2017] [Indexed: 02/06/2023] Open
Abstract
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. Given the rapid evolution of many pathogens, analysing their genomes by means of phylogenies can inform us about how they spread. This is the focus of the field known as “phylodynamics”. Most existing methods inferring epidemiological parameters from virus phylogenies are limited by the difficulty of handling complex likelihood functions, which commonly incorporate latent variables. Here, we use an alternative method known as regression-based Approximate Bayesian Computation (ABC), which circumvents this problem by using simulations and dataset comparisons. Since phylogenies are difficult to compare to one another, we introduce many summary statistics to describe them and take advantage of current machine learning techniques able to perform variable selection. We show that the accuracy we reach is comparable to that of existing methods. This accuracy increases with phylogeny size and can even be higher than that of existing methods for some parameters. Overall, regression-based ABC opens new perspectives to infer epidemiological parameters from large phylogenies.
Collapse
|
36
|
Abstract
For infectious diseases, a genetic cluster is a group of closely related infections that is usually interpreted as representing a recent outbreak of transmission. Genetic clustering methods are becoming increasingly popular for molecular epidemiology, especially in the context of HIV where there is now considerable interest in applying these methods to prioritize groups for public health resources such as pre-exposure prophylaxis. To date, genetic clustering has generally been performed with ad hoc algorithms, only some of which have since been encoded and distributed as free software. These algorithms have seldom been validated on simulated data where clusters are known, and their interpretation and similarities are not transparent to users outside of the field. Here, I provide a brief overview on the development and inter-relationships of genetic clustering methods, and an evaluation of six methods on data simulated under an epidemic model in a risk-structured population. The simulation analysis demonstrates that the majority of clustering methods are systematically biased to detect variation in sampling rates among subpopulations, not variation in transmission rates. I discuss these results in the context of previous work and the implications for public health applications of genetic clustering.
Collapse
Affiliation(s)
- Art F Y Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
37
|
Spiro A, Shapiro E. eSTGt: a programming and simulation environment for population dynamics. BMC Bioinformatics 2016; 17:187. [PMID: 27117841 PMCID: PMC4847376 DOI: 10.1186/s12859-016-1004-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 03/29/2016] [Indexed: 11/10/2022] Open
Abstract
Background We have previously presented a formal language for describing population dynamics based on environment-dependent Stochastic Tree Grammars (eSTG). The language captures in broad terms the effect of the changing environment while abstracting away details on interaction among individuals. An eSTG program consists of a set of stochastic tree grammar transition rules that are context-free. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count and elapsed time. In addition, each individual may have an internal state, which can change during transitions. Results This paper presents eSTGt (eSTG tool), an eSTG programming and simulation environment. When executing a program, the tool generates the corresponding lineage trees as well as the internal states values, which can then be analyzed either through the tool’s GUI or using MATLAB’s command-line environment. Conclusions The presented tool allows researchers to use existing biological knowledge in order to model the dynamics of a developmental process and analyze its behavior throughout the historical events. Simulated lineage trees can be used to validate various hypotheses in silico and to predict the behavior of dynamical systems under various conditions. Written under MATLAB environment, the tool also enables to easily integrate the output data within the user’s downstream analysis.
Collapse
Affiliation(s)
- Adam Spiro
- Department of Computer Science and Applied Mathematics and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics and Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
38
|
Kühnert D, Stadler T, Vaughan TG, Drummond AJ. Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data. Mol Biol Evol 2016; 33:2102-16. [PMID: 27189573 PMCID: PMC4948704 DOI: 10.1093/molbev/msw064] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
When viruses spread, outbreaks can be spawned in previously unaffected regions. Depending on the time and mode of introduction, each regional outbreak can have its own epidemic dynamics. The migration and phylodynamic processes are often intertwined and need to be taken into account when analyzing temporally and spatially structured virus data. In this article, we present a fully probabilistic approach for the joint reconstruction of phylodynamic history in structured populations (such as geographic structure) based on a multitype birth-death process. This approach can be used to quantify the spread of a pathogen in a structured population. Changes in epidemic dynamics through time within subpopulations are incorporated through piecewise constant changes in transmission parameters.We analyze a global human influenza H3N2 virus data set from a geographically structured host population to demonstrate how seasonal dynamics can be inferred simultaneously with the phylogeny and migration process. Our results suggest that the main migration path among the northern, tropical, and southern region represented in the sample analyzed here is the one leading from the tropics to the northern region. Furthermore, the time-dependent transmission dynamics between and within two HIV risk groups, heterosexuals and injecting drug users, in the Latvian HIV epidemic are investigated. Our analyses confirm that the Latvian HIV epidemic peaking around 2001 was mainly driven by the injecting drug user risk group.
Collapse
Affiliation(s)
- Denise Kühnert
- Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland Department of Computer Science, University of Auckland, Auckland, New Zealand Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Timothy G Vaughan
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
39
|
Volz EM, Frost SDW. Sampling through time and phylodynamic inference with coalescent and birth-death models. J R Soc Interface 2015; 11:20140945. [PMID: 25401173 PMCID: PMC4223917 DOI: 10.1098/rsif.2014.0945] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth–death-sampling model (BDM), in the context of estimating population size and birth rates in a population growing exponentially according to the birth–death branching process. For sequences sampled at a single time, we found the coalescent and the BDM gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth–death model estimators are subject to large bias if the sampling process is misspecified. Since BDMs incorporate a model of the sampling process, we show how much of the statistical power of BDMs arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information.
Collapse
Affiliation(s)
- Erik M. Volz
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
- e-mail:
| | - Simon D. W. Frost
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
40
|
Abstract
The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this “kernel-ABC” method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods.
Collapse
Affiliation(s)
- Art F Y Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada Department of Medicine, University of British Columbia, Vancouver, BC, Canada Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
41
|
Inferring epidemiological dynamics with Bayesian coalescent inference: the merits of deterministic and stochastic models. Genetics 2014; 199:595-607. [PMID: 25527289 PMCID: PMC4317665 DOI: 10.1534/genetics.114.172791] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations.
Collapse
|
42
|
Spiro A, Cardelli L, Shapiro E. Lineage grammars: describing, simulating and analyzing population dynamics. BMC Bioinformatics 2014; 15:249. [PMID: 25047682 PMCID: PMC4223406 DOI: 10.1186/1471-2105-15-249] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Accepted: 07/07/2014] [Indexed: 11/17/2022] Open
Abstract
Background Precise description of the dynamics of biological processes would enable the mathematical analysis and computational simulation of complex biological phenomena. Languages such as Chemical Reaction Networks and Process Algebras cater for the detailed description of interactions among individuals and for the simulation and analysis of ensuing behaviors of populations. However, often knowledge of such interactions is lacking or not available. Yet complete oblivion to the environment would make the description of any biological process vacuous. Here we present a language for describing population dynamics that abstracts away detailed interaction among individuals, yet captures in broad terms the effect of the changing environment, based on environment-dependent Stochastic Tree Grammars (eSTG). It is comprised of a set of stochastic tree grammar transition rules, which are context-free and as such abstract away specific interactions among individuals. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count, and elapsed time. Results We show that eSTGs conveniently describe population dynamics at multiple levels including cellular dynamics, tissue development and niches of organisms. Notably, we show the utilization of eSTG for cases in which the dynamics is regulated by environmental factors, which affect the fate and rate of decisions of the different species. eSTGs are lineage grammars, in the sense that execution of an eSTG program generates the corresponding lineage trees, which can be used to analyze the evolutionary and developmental history of the biological system under investigation. These lineage trees contain a representation of the entire events history of the system, including the dynamics that led to the existing as well as to the extinct individuals. Conclusions We conclude that our suggested formalism can be used to easily specify, simulate and analyze complex biological systems, and supports modular description of local biological dynamics that can be later used as “black boxes” in a larger scope, thus enabling a gradual and hierarchical definition and simulation of complex biological systems. The simple, yet robust formalism enables to target a broad class of stochastic dynamic behaviors, especially those that can be modeled using global environmental feedback regulation rather than direct interaction between individuals.
Collapse
Affiliation(s)
| | | | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
43
|
Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics 2014; 30:2272-9. [PMID: 24753484 PMCID: PMC4207426 DOI: 10.1093/bioinformatics/btu201] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation: Population structure significantly affects evolutionary dynamics. Such structure may be due to spatial segregation, but may also reflect any other gene-flow-limiting aspect of a model. In combination with the structured coalescent, this fact can be used to inform phylogenetic tree reconstruction, as well as to infer parameters such as migration rates and subpopulation sizes from annotated sequence data. However, conducting Bayesian inference under the structured coalescent is impeded by the difficulty of constructing Markov Chain Monte Carlo (MCMC) sampling algorithms (samplers) capable of efficiently exploring the state space. Results: In this article, we present a new MCMC sampler capable of sampling from posterior distributions over structured trees: timed phylogenetic trees in which lineages are associated with the distinct subpopulation in which they lie. The sampler includes a set of MCMC proposal functions that offer significant mixing improvements over a previously published method. Furthermore, its implementation as a BEAST 2 package ensures maximum flexibility with respect to model and prior specification. We demonstrate the usefulness of this new sampler by using it to infer migration rates and effective population sizes of H3N2 influenza between New Zealand, New York and Hong Kong from publicly available hemagglutinin (HA) gene sequences under the structured coalescent. Availability and implementation: The sampler has been implemented as a publicly available BEAST 2 package that is distributed under version 3 of the GNU General Public License at http://compevol.github.io/MultiTypeTree. Contact:tgvaughan@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Denise Kühnert
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Alex Popinga
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - David Welch
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| | - Alexei J Drummond
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New ZealandAllan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North 4442, New Zealand, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zurich 8092, Switzerland and Department of Computer Science, University of Auckland, Auckland 1142, New Zealand
| |
Collapse
|
44
|
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014; 10:e1003537. [PMID: 24722319 PMCID: PMC3985171 DOI: 10.1371/journal.pcbi.1003537] [Citation(s) in RCA: 3701] [Impact Index Per Article: 370.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 01/20/2014] [Indexed: 12/15/2022] Open
Abstract
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
Collapse
Affiliation(s)
- Remco Bouckaert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- * E-mail: (RB); (AJD)
| | - Joseph Heled
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Denise Kühnert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland
| | - Tim Vaughan
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| | - Chieh-Hsi Wu
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Dong Xie
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Marc A. Suchard
- Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Biostatistics, School of Public Health, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Alexei J. Drummond
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
- * E-mail: (RB); (AJD)
| |
Collapse
|
45
|
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014. [PMID: 24722319 DOI: 10.1371/journal.pcbi.1003537i] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
Collapse
Affiliation(s)
- Remco Bouckaert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Joseph Heled
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Denise Kühnert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland
| | - Tim Vaughan
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| | - Chieh-Hsi Wu
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Dong Xie
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Marc A Suchard
- Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America; Department of Biostatistics, School of Public Health, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Alexei J Drummond
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|